
(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World Intellectual Property Organization 

International Bureau 

(43) International Publication Date 
17 July 2003 (17.07.2003) 




PCT 



iiiiiniiiuniiiiiiHiD 

(10) International Publication Number 

WO 03/056914 Al 



(51) International Patent Classification 7 : A01K 67/027, 
C12N9/10, 1/04, 1/16, 1/18 

(21) International Application Number: PCT/US02/41510 

(22) International Filing Date: 

24 December 2002 (24.12.2002) 



(25) Filing Language: 

(26) Publication Language: 



English 
English 



(30) Priority Data: 

60/344,169 27 December 2001 (27.12.2001) US 

(71) Applicant (for all designated States except US): GLY- 
COFI, INC. [US/US]; 21 Lafayette Street, Suite 200, 
Lebanon, NH 03766 (US). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): WILDT, Stefan 
[US/US]; 32 Parkhurst Street, Lebanon, NH 03766 (US). 
MDQELE, Robert, Gordon [US/US]; 4 Renihan Meadows, 
Lebanon, NH 03766 (US). NETT, Juergen, Hermann 
[DE/US]; 11 Rocky Hill Road # 211, Enfield, NH 03748 
(US). DAVIDSON, Robert, C. [US/US]; 37 Landing 
Road # 2, Enfield, NH 03748 (US). 



(74) Agents: HALEY, James, F. et al.; Fish & Neave, 1251 
Avenue of the Americas, New York, NY 10020 (US). 

(81) Designated States (national): AE, AG, AL, AM, AT, AU, 
AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CO, CR, CU, 
CZ, DE, DK, DM, DZ, EC, EE, ES, FT, GB, GD, GE, GH, 
GM, HR, HU, ID, IL, IN, IS, IP, KE, KG, KP, KR, KZ, LC, 
LK, LR, LS, LT, LU, LV, MA, MD, MG, MX, MN, MW, 
MX, MZ, NO, NZ, OM, PH, PL, PT, RO, RU, SC, SD, SE, 
SG, SK, SL, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, 
VC, VN, YU, ZA, ZM, ZW. 

(84) Designated States (regional): AREPO patent (GH, GM, 
KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZM, ZW), 
Eurasian patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), 
European patent (AT, BE, BG, CH, CY, CZ, DE, DK, EE, 
ES, FI, FR, GB, GR, IE, IT, LU, MC, NL, PT, SE, SI, SK, 
TR), OAPI patent (BF, BJ, CF, CG, d, CM, GA, GN, GQ, 
GW, ML, MR, NE, SN, TD, TG). 

Published: 

— with international search report 

— before the expiration of the time limit for amending the 
claims and to be republished in the event of receipt of 
amendments 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



^ (54) Title: METHODS TO ENGINEER MAMMALIAN-TYPE CARBOHYDRATE STRUCTURES 
ON 

^ (57) Abstract: The present invention relates to host cells having modified Iipid-linked oligosaccharides which may be modified 
J£ further by heterologous expression of a set of glycosyltransferases, sugar transporters and mannosidases to become host-strains for 
the production of mammalian, e.g., human therapeutic glycoproteins. The process provides an engineered host cell which can be 
used to express and target any desirable gene(s) involved in glycosylation. Host cells with modified lipid-Iinked oligosaccharides 
are created or selected. N-glycans made in the engineered host cells have a GlcNAcMan 3 G3cNAc2 core structure which may then be 
Q modified further by heterologous expression of one or more enzymes, e.g., glycosyl -transferases, sugar transporters and mannosi- 
dases, to yield human-like glycoproteins. For the production of therapeutic proteins, this method may be adapted to engineer cell 
lines in which any desired glycosylation structure may be obtained. 



GF0021 



WO 03/056914 



PCT/DS02/41510 



METHODS TO ENGINEER MAMMALIAN-TYPE CARBOHYDRATE 

STRUCTURES 

CROSS-REFERENCE TO RELATED APPLICATIONS 

[0001] This application claims priority to U. S. provisional application Ser. No. 
5 60/344,169, Dec. 27, 2001 , which is incorporated by reference herein in its 
entirety. 

FIELD OF THE INVENTION 

[0002] The present invention generally relates to modifying the glycosylation 
structures of recombinant proteins expressed in fungi or other lower eukaryotes, to 
10 more closely resemble the glycosylation of proteins of higher mammals, in 
particular humans. 

BACKGROUND OF THE INVENTION 

[0003] After DNA is transcribed and translated into a protein, further post 
1 5 translational processing involves the attachment of sugar residues, a process known 
as glycosylation. Different organisms produce different glycosylation enzymes 
(glycosyltransferases and glycosidases), and have different substrates (nucleotide 
sugars) available, so that the glycosylation patterns as well as composition of the 
individual oligosaccharides, even of one and the same protein, will be different 
20 depending on the host system in which the particular protein is being expressed. 
Bacteria typically do not glycosylate proteins, and if so only in a very unspecific 
manner (Moens, 1997). Lower eukaryotes such as filamentous fungi and yeast add 
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primarily mannose and mannosylphosphate sugars, whereas insect cells such as 
S£9 cells glycosylate proteins in yet another way. See for example (Bretthauer, 
1999; Martinet, 1998; Weikert, 1999; Malissard, 2000; Jarvis, 1998; and Takeuchi, 
1997). 

5 [0004] Synthesis of a mammalian-type oligosaccharide structure consists of a 
series of reactions in the course of which sugar-residues are added and removed 
while the protein moves along the secretory pathway in the host organism. The 
enzymes which reside along the glycosylation pathway of the host organism or cell 
determine what the resulting glycosylation patterns of secreted proteins. 

1 0 Unfortunately, the resulting glycosylation pattern of proteins expressed in lower 
eukaryotic host cells differs substantially from the glycosylation found in higher 
eukaryotes such as humans and other mammals (Bretthauer, 1999). Moreover, the 
vastly different glycosylation pattern has, in some cases, been shown to increase 
the immunogenicity of these proteins in humans and reduce their half-life 

15 (Takeuchi, 1997). It would be desirable to produce human-like glycoproteins in 
non-human host cells, especially lower eukaryotic cells. 

[0005] The early steps of human glycosylation can be divided into at least two 
different phases: (i) lipid-linked Glc 3 Man9GlcNAc 2 oligosaccharides are assembled 
by a sequential set of reactions at the membrane of the endoplasmic reticulum (ER) 

20 and (ii) the transfer of this oligosaccharide from the lipid anchor dolichyl 

pyrophosphate onto de novo synthesized protein. The site of the specific transfer is 
defined by an asparagine (Asn) residue in the sequence Asn-Xaa-Ser/Thr (see Fig. 
1), where Xaa can be any amino acid except proline (Gavel, 1990). Further 
processing by glucosidases and mannosidases occurs in the ER before the nascent 

25 glycoprotein is transferred to the early Golgi apparatus, where additional mannose 
residues are removed by Golgi specific alpha (oc)-l,2-mannosidases. Processing 
continues as the protein proceeds through the Golgi. In the medial Golgi, a 
number of modifying enzymes, including N-acetylglucosaminyltransferases (GnT 
I, GnT II, GnT HE, GnT TV GnT V GnT VI), mannosidase II and 

30 fucosyltransferases, add and remove specific sugar residues (see, e.g., Figs. 2 and 
3). Finally, in the trans-Golgi, galactosyltranferases and sialyltransferases produce 
a glycoprotein structure that is released from the Golgi. It is this structure, 
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characterized by bi-, tri- and tetra-anteraiary structures, containing galactose, 
fucose, N-acetylglucosamine and a high degree of terminal sialic acid, that gives 
glycoproteins their human characteristics. 

[00061 In nearly all eukaryotes, glycoproteins are derived from the common core 

5 oligosaccharide precursor Glc 3 Man 9 GlcNAc 2 -PP-DoL where PP-Dol stands for 
dohchol-pyrophosphate (Fig. 1). Within the endoplasmic reticulum, synthesis and 
processing of dolichol pyrophosphate bound ohgosaccharides are identical 
between all known eukaryotes. However, further processing of the core 
oligosaccharide by yeast, once it has been transferred to a peptide leaving the ER 

10 and entering the Golgi, differs significantly from humans as it moves along the 
secretory pathway and involves the addition of several mannose sugars. 
[0007] In yeast, these steps are catalyzed by Golgi residing 
mannosyltransferases, like Ochlp, Mntlp and Mnnlp, which sequentially add 
mannose sugars to the core oligosaccharide. The resulting structure is undesirable 

15 for the production of humanoid proteins and it is thus desirable to reduce or 
elirninate mannosyltransferase activity. Mutants of & cerevisiae, deficient in 
mannosyltransferase activity (for example ochl or mnn9 mutants) have been 
shown to be non-lethal and display a reduced mannose content in the 
oligosacharide of yeast glycoproteins. Other oligosacharide processing enzymes, 

20 such as mannosylphophate transferase may also have to be eliminated depending 
on the host's particular endogenous glycosylation pattern. 
Lipid-Linked Oligosaccharide Precursors 

[0008] Of particular interest for this invention are the early steps of N- 
glycosylation (Figs. 1 and 2). The study of alg (a^aragme-hnked glycosylation) 
25 mutants defective in the biosynthesis of the Glc 3 Man 9 GlcNAc2-PP-Dol has helped 
to elucidate the initial steps of N-glycosylation. 

[0009] The ALG3 gene of S. cerevisiae has been succesfully cloned and knocked 
out by deletion (Aebi, 1996). ALG3 has been shown to encode the enzyme Dol-P- 
Man:Man 5 GlcNAc 2 -PP-Dol Mannosyltransferase, which is involved in the first 
30 Dol-P-Man dependent mannosylation step from Man 5 GlcNAc 2 -PP-Dol to 

MansGlcNAcz-PP-Dol at the luminal side of the ER (Sharma, 2001) (Figs 1 and 
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2). S.cerevisiae cells harboring a leaky alg3-l mutation accumulate 
Man 5 GlcNAc 2 -PP-Dol (structure I) (Huffaker, 1983). 



Structure I: Man5GlcNAc 2 
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10 Mau 5 GlcNAc 2 (Structure I) and MangGlcNAc 2 accumulate in total cell 

mannoprotein of an ochl mnnl alg3 mutant(N akaniste-Shindo, 1993). This 
S.cerevisiae ochl, mnnl, alg3 mutant was shown to be viable, but temperature- 
sensitive, and to lack oc-1,6 polymannose outer chains. 

[0010] In another study, secretory proteins expressed in a strain deleted for alg 3 

1 5 (Aalg3 background) were studied for their resistance to Endo-p-N- 

acetylglucosaminidase H (Endo H) (Aebi, 1996). Previous observations have 
indicated that only those oligosaccharides larger than Man 5 GlcNAc 2 are 
susceptible to cleavage by Endo H (Hubbard, 1980). In the alg3-l phenotype, 
some glycoforms were sensitive to Endo H cleavage, confirming its leakiness, 

20 whereas in the Aalg3 mutant all glycoforms appeared to be resistant and of the 
Man 5 -type (Aebi, 1996), suggesting a tight phenotype and transfer of 
Man 5 GlcNAc 2 oligosaccharide structures onto the nascent polypeptide chain. No 
obvious phenotype was connected with the inactivation of the^LLGJ gene (Aebi, 
1996). Secreted exogluconase produced in a Saccharomyces cerevisiae alg3 

25 mutant was found to contain between 35-44% underglycosylated and 

unglycosylated forms and only about 50% of the transferred oligosaccharides 
remained resistant to Endo H treatment (Cueva, 1996). Exoglucanase (Exg), an 
enzyme that contains two potential N-glycosylation sites at Asni65 and Asn 32 5> was 
analyzed in more detail. For Exg molecules that received two oligosaccharides it 

30 was shown that the first N-glycosylation site (Asn^s) was enriched in truncated 
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residues, whereas the second (Asn 32 s) was enriched in regular oligosaccharides. 
35-44% of secreted exoglucanase was non- or underglycosylated and about 73 - 78 
% of all available N-glycosylation sites were occupied with either truncated or 
regular oligosaccharides (Cueva, 1996). 
5 Transfer of Glucosylated Lipid-Linked Oligosaccharides 

[0011] Evidence suggests that, in mammalian cells, only glucosylated lipid- 
linked oligosaccharides are transferred to nascent proteins (Turco, 1977), while in 
yeast alg5, alg6 and dpgl mutants, nonglucosylated oligosaccharideds can be 
transferred (Ballou, 1986; Runge, 1984). In a Saccharomyces cerevisiae alg8 

1 0 mutant, underglucosylated GlcMan 9 GlcNAc 2 is transferred (Runge, 1 986). 

Verostek and co-workers studied an alg3, secI8, ghl mutant and proposed that 
glucosylation of a Man 5 GlcNAc 2 structure (Structure I, above) is relatively slow in 
comparison to glucosylation of a lipid-linked Manp structure. In addition, the 
transfer of this Man 5 GlcNAc 2 structure to protein appears to be about 5-fold more 

15 efficient than the glucosylation to Glc3Man 5 GlcNAc 2 . The decreased rate of 

MansGlcNAc 2 glucosylation in combination with the comparatively faster rate of 
Mans structure transfer onto nascent protein is believed to be the cause of the 
observed accumulation of nonglucosylated Man 5 structures in alg3 mutant yeast 
(Verostek-a, 1993; Verostek-b, 1993). 

20 [0012] Studies preceding the above work did not reveal any lipid-linked 
glucosylated oligosaccharides (Orlean, 1990; Huffaker, 1983) allowing the 
conclusion that glucosylated oligosaccharides are transferred at a much higher rate 
than their nonglucosylated counterparts and thus are much harder to isolate. 
Recent work has allowed the creation and study of yeast strains with un- and 

25 hypoglucosylated oligosaccharides and has further confirmed the importance of the 
addition of glucose to the antenna of lipid-linked oligosaccharides for substrate 
recognition by the ohgosaccharyltransferase complex (Reiss, 1996; Stagljar, 1994; 
Burda, 1998). The decreased degree of glucosylation of the lipid-linked Man 5 - 
oligosaccharides in an alg3 mutant negatively impacts the kinetics of the transfer 

30 of lipid-linked oligosaccharides onto nascent protein and is believed to be the 

cause for the strong underglycosylation of secreted proteins in an alg3 knock-out 
strain (Aebi, 1996). 



WO 03/056914 PCTYUS02/41510 

[0013] The assembly of the lipid-linked core oligosaccharide Man9GlcNAc 2 
occurs, as described above, at the membrane of the endoplasmatic reticulum. The 
additions of three glucose units to the cc-l,3-antenna of the lipid-linked 
oligosaccharides are the final reactions in the oligosaccharide assembly. First an 
5 a- 1,3 glucose residue is added followed by another a- 1,3 glucose residue and a 
terminal a-1,2 glucose residue. Mutants accumulating doHchol-linked 
ManpGlcNAca have been shown to be defective in the ALG6 locus, and Alg6p has 
similarities to Alg8p, the a-l,3-glucosyltransferase catalyzing the addition of the 
second ot-l,3-linked glucose (Reiss, 1996). Cells with a defective ALG8 locus 

10 accumulate dolichol-linked GlciMan 9 GlcNAc 2 (Runge, 1986; Stagljar, 1994). The 
ALG10 locus encodes the a-1,2 glucosyltransferase responsible for the addition of 
a single terminal glucose to GlcaManpGlcNA^-PP-Dol (Burda, 1998). 
Sequential Processing of N-glycans by Localized Enzyme Activities 
[0014] Sugar transferases and mannosidases line the inner (luminal) surface of 

15 the ER and Golgi apparatus and thereby provide a "catalytic" surface that allows 
for the sequential processing of glycoproteins as they proceed through the ER and 
Golgi network. In fact the multiple compartments of the cis, medial, and trans 
Golgi and the trans-Golgi Network (TGN), provide the different localities in 
which the ordered sequence of glycosylation reactions can take place. As a 

20 glycoprotein proceeds from synthesis in the ER to full maturation in the late Golgi 
or TGN, it is sequentially exposed to different glycosidases, mannosidases and 
glycosyltransferases such that a specific carbohydrate structure may synthesized. 
Much work has been dedicated to revealing the exact mechanism by which these 
enzymes are retained and anchored to their respective organelle. The evolving 

25 picture is complex but evidence suggests that, stem region, membrane spanning 
region and cytoplasmic tail individually or in concert direct enzymes to the 
membrane of individual organelles and thereby localize the associated catalytic 
domain to that locus. 

[0015] In some cases these specific interactions were found to function across 
30 species. For example the membrane spanning domain of oc2,6-ST from rats, an 
enzyme known to localize in the trans-Golgi of the animal, was shown to also 
localize a reporter gene (invertase) in the yeast Golgi (Schwientek, 1995). 
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However, the very same membrane spanning domain as part of a full-length o2,6 
ST was retained in the ER and not further transported to the Golgi of yeast 
(Krezdorn, 1994). A full length Gal-Tr from humans was not even synthesized in 
yeast, despite demonstrably high transcription levels. On the other hand the 

5 transmembrane region of human the same GalT fused to an invertase reporter was 
able to direct localization to the yeast Golgi, albeit it at low production levels. 
Schwientek and co-workers have shown that fusing 28 amino acids of a yeast 
mannosyltransferase (Mntl), a region containing a cytoplamic tail, a 
transmembrane region and eight amino acids of the stem region, to the catalytic 

10 domain of human GalT are sufficient for Golgi localization of an active GalT. 

Other galactosyltransferases appear to rely on interactions with enzymes resident 
in particular organelles since after removal of their transmembrane region they are 
still able to localize properly. To date there exists no reliable way of predicting 
whether a particular heterologously expressed glycosyltransferase or mannosidase 

15 in a lower eukaryote will be (1), sufficiently translated (2), catalytically active or 
(3) located to the proper organelle within the secretory pathway. Since all three of 
these are necessary to effect glycosylation patterns in lower eukaryotes, a 
systematic scheme to achieve the desired catalytic function and proper retention of 
enzymes in the absence of predictive tools, which are currently not available, has 

20 been designed. 

Production of Therapeutic Glycoproteins 

[0016] A significant number of proteins isolated from humans or animals are 
post-translationally modified, with glycosylation being one of the most significant 
modifications. An estimated 70% of all therapeutic proteins are glycosylated and 

25 thus currently rely on a production system (i.e., host cell) that is able to glycosylate 
in a manner similar to humans. To date, most glycoproteins are made in a 
mammalian host system. Several studies have shown that glycosylation plays an 
important role in determining the (1) immunogenicity, (2) pharmacokinetic 
properties, (3) trafficking, and (4) efficacy of therapeutic proteins. It is thus not 

30 surprising that substantial efforts by the pharmaceutical industry have been 

directed at developing processes to obtain glycoproteins that are as "humanoid" or 
"human-like" as possible. This may involve the genetic engineering of such 
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mammalian cells to enhance the degree of sialylation (i.e., terminal addition of 
sialic acid) of proteins expressed by the cells, which is known to improve 
pharmacokinetic properties of such proteins. Alternatively one may improve the 
degree of sialylation by in vitro addition of such sugars using known 
5 glycosyltransferases and their respective nucleotide sugars (e.g., 2,3 
sialyltransferase and CMP-Siahc acid). 

[0017] Future research may reveal the biological and therapeutic significance of 
specific glycoforms, thereby rendering the ability to produce such specific 
glycoforms desirable. To date, efforts have concentrated on making proteins with 
10 fairly well characterized glycosylation patterns, and expressing a cDNA encoding 
such a protein in one of the following higher eukaryotic protein expression 
systems: 

1. Higher eukaryotes such as Chinese hamster ovary cells (CHO), 
mouse fibroblast cells and mouse myeloma cells (W erner, 1998); 
15 2. Transgenic animals such as goats, sheep, mice and others (Dente, 

1988); (Cole, 1994); (McGarvey, 1995); (Bardor, 1999); 

3. Plants (Arabidopsis thaliana, tobacco etc.) (Staub, 2000); 
(McGarvey, 1995); (Bardor, 1999); 

4. Insect cells (Spodoptera frugiperda Sf9, Sf21, Trichoplusia ni, etc., 
20 in combination with recombinant baculoviruses such as Autographa californica 

multiple nuclear polyhedrosis virus which infects lepidopteran cells (Altmann, 
1999). 

[0018] While most higher eukaryotes carry out glycosylation reactions that are 
similar to those found in humans, recombinant human proteins expressed in the 

25 above mentioned host systems invariably differ from their "natural" human 

counterpart (Raju, 2000). Extensive development work has thus been directed at 
finding ways to improving the "human character" of proteins made in these 
expression systems. This includes the optimization of fermentation conditions and 
the genetic modification of protein expression hosts by introducing genes encoding 

30 enzymes involved in the formation of human like glycoforms (Werner, 1998); 
(Weikert, 1999); (Andersen, 1994); (Yang, 2000). Inherent problems associated 
with all mammalian expression systems have not been solved. 
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[0019] Fermentation processes based on mammalian cell culture (e.g., CHO, 
murine, or human cells), for example, tend to be very slow (fermentation times in 
excess of one week are not uncommon), often yield low product titers, require 
expensive nutrients and cofactors (e.g., bovine fetal serum), are limited by 
5 programmed cell death (apoptosis), and often do not enable expression of 

particular therapeutically valuable proteins. More importantly, mammalian cells 
are susceptible to viruses that have the potential to be human pathogens and 
stringent quality controls are required to assure product safety. This is of particular 
concern since many such processes require the addition of complex and 

10 temperature sensitive media components that are derived from animals (e.g., 

bovine calf serum), which may carry agents pathogenic to humans such as bovine 
spongiform encephalopathy (BSE) prions or viruses. Moreover, the production of 
therapeutic compounds is preferably carried out in a well-controlled sterile 
environment An animal farm, no matter how cleanly kept, does not constitute 

1 5 such an environment, thus constituting an additional problem in the use of 
transgenic animals for manufacturing high volume therapeutic proteins. 
[0020] Most, if not all, currently produced therapeutic glycoproteins are therefore 
expressed in mammalian cells and much effort has been directed at improving (i.e., 
ce humanizing") the glycosylation pattern of these recombinant proteins. Changes in 

20 medium composition as well as the co-expression of genes encoding enzymes 
involved in human glycosylation have been successfully employed (see, for 
example, Weikert, 1999). 

[0021] While recombinant proteins similar to their human counterparts can be 
made in mammalian expression systems, it is currently not possible to make 

25 proteins with a human-like glycosylation pattern in lower eukaryotes (fungi and 
yeast). Although the core oligosaccharide structure transferred to a protein in the 
endoplasmic reticulum is basically identical in mammals and lower eukaryotes, 
substantial differences have been found in the subsequent processing reactions 
which occur in in the Golgi apparatus of fungi and mammals. In fact, even 

30 amongst different lower eukaryotes there exist a great variety of glycosylation 
structures. This has prevented the use of lower eukaryotes as hosts for the 
production of recombinant human glycoproteins despite otherwise notable 
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advantages over mammalian expression systems, such as: (1) generally higher 
product titers, (2) shorter fermentation times, (3) having an alternative for proteins 
that are poorly expressed in mammalian cells, (4) the ability to grow in a 
chemically defined protein free medium and thus not requiring complex animal 
5 derived media components, (5) and the absence of viral, especially retroviral 
infections of such hosts. 

[0022] Various methylotrophic yeasts such as Pichia pastoris, Pichia 
methanolica, and Hansenula polyrnorpha, have played particularly important roles 
as eukaryptic expression systems because they are able to grow to high cell 
10 densities and secrete large quantities of recombinant protein. However, as noted 
above, lower eukaryotes such as yeast do not glycosylate proteins like higher 
mammals. See for example, Martinet et al (1998) Biotechnol Let. Vol. 20. No. 12, 
which discloses the expression of a heterologous mannosidase in the endoplasmic 
reticulum (ER). 

15 [0023] Chiba et al. (1998) have shown that S.cerevisiae can be engineered to 
provide structures ranging from MansGlcNAc 2 to MansGlcNAc 2 structures, by 
e liminat ing 1,6 mannosyltransferase (OCH1), 1,3 mannosyltransferase (MNN1) 
and a regulator of mannosylphosphatetransferase (MNN4) and by targeting the 
catalytic domain of a- 1,2 -mannosidase I from Aspergillus saitoi into the ER of 

20 S.cerevisiae using an ER retrieval sequence (Chiba, 1998). However, this attempt 
resulted in little or no production of the desired Man 5 GlcNAc 2 , e.g., one that was 
made in vivo and which could function as a substrate for GnTl (the next step in 
making human-like glycan structures). Chiba et al. (1998) showed that P. pastoris 
is not inherently able to produce useful quantities (greater than 5%) of 

25 GlcNAcTransferase I accepting carbohydrate. 

[0024] Maras and co-workers assert that in T. reesei "sufficient concentrations of 
acceptor substrate (i.e. Man5GlcNAc 2 ) are presenf \ however when trying to 
convert this acceptor substrate to GlcNAcMan 5 GlcNAc 2 in vitro less than 2% were 
converted thereby demonstrating the presence of Man 5 GlcNAc 2 structures that are 

30 not suitable precursors for complex N-glycan formation (Maras, 1997; Maras, 
1999). To date no enabling disclosure exists, that allows for the production of 
commercially relevant quantities of GlcNAcMansGlcNAc 2 in lower eukaryotes. 
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[0025] It is therefore an object of the present invention to provide a system and 
methods for humanizing glycosylation of recombinant glycoproteins expressed in 
non-human host cells. 

5 SUMMARY OF THE INVENTION 

[0026] The present invention relates to host cells such as fungal strains having 
modified lipid-linked oligosaccharides which may be modified further by 
heterologous expression of a set of glycosyltransferases, sugar transporters and 
mannosidases to become host-strains for the production of mammalian, e.g., 

10 human therapeutic glycoproteins. A protein production method has been 

developed using (1) a lower eukaryotic host such as a unicellular or filamentous 
fungus, or (2) any non-human eukaryotic organism that has a different 
glycosylation pattern from humans, to modify the glycosylation composition and 
structures of the proteins made in a host organism ("host cell") so that they 

15 resemble more closely carbohydrate structures found in human proteins. The 

process allows one to obtain an engineered host cell which can be used to express 
and target any desirable gene(s) involved in glycosylation by methods that are well 
established in the scientific literature and generally known to the artisan in the field 
of protein expression As described herein, host cells with modified lipid-linked 

20 oligosaccharides are created or selected. N-glycans made in the engineered host 
cells have a GlcNAcMan 3 GlcNAc 2 core structure which may then be modified 
further by heterologous expression of one or more enzymes, e.g., glycosyl- 
transferases, sugar transporters and mannosidases, to yield human-like 
glycoproteins. For the production of therapeutic proteins, this method may be 

25 adapted to engineer cell lines in which any desired glycosylation structure may be 
obtained. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[00271 Figure 1 is a schematic of the structure of the dolichyl pyrophosphate- 
30 linked oligosaccharide. 
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[0028] Figure 2 is a schematic of the generation of GlcNAc 2 Man 3 GlcNAc2 N- 
glycans from fungal host cells which are deficient in alg3, alg9 or alg 12 activities. 
[0029] Figure 3 is a schematic of processing reactions required to produce 
mammalian-type oligosaccharide structures in a fungal host cell with an alg3 9 ochl 
5 genotype. 

[0030] Figure 4 shows S. cerevisiae Alg3 Sequence Comparisons (Blast) 
[0031] Figure 5 shows S. cerevisiae Alg 3 and Alg 3p Sequences 
[0032] Figure 6 shows Pi- pastoris Alg 3 and Alg 3p Sequences 
[0033] Figure 7 shows P. pastoris Alg 3 Sequence Comparisons (Blast) 

1 0 [0034] Figure 8 shows K, lactis Alg 3 and Alg 3p Sequences 

[0035] Figure 9 shows KL lactis Alg 3 Sequence Comparisons (Blast) 
[0036] Figure 10 shows S. cerevisiae Alg 9 and Alg 9p Sequences 
[0037] Figure 1 1 shows P. pastoris Alg 9 and Alg 9p Sequences 
[0038] Figure 12 shows P. pastoris Alg 9 Sequence Comparisons (Blast) 

1 5 [0039] Figure 13 shows S. cerevisiae Alg 12 and Alg 12p Sequences 
[0040] Figure 14 shows P. pastoris Alg 12 and Alg 12p Sequences 
[0041] Figure 15 shows P. pastoris Alg 12 Sequence Comparisons (Blast) 
[0042] Figure 16 is a MALDI-TOF-MS analysis of N-glycans isolated from a 
kringle 3 glycoprotein produced in a P. pastoris showing that the predominant N- 

20 glycan is GlcNAcMan 5 GlcNAc2. 

[0043] Figure 17 is a MALDI-TOF-MS analysis of N-glycans isolated from a 
kringle 3 glycoprotein produced in a P. pastor is (Fig. 16) treated with /?-N- 
hexosaminidase (peak corresponding to Man5GlcNAc 2 ) to confirm that the 
predominant N-glycan of Fig. 16 is GlcNAcMan 5 GlcNAc 2 . 

25 [0044] Figure 18 is a MALDI-TOF-MS analysis of N-glycans isolated from a 
kringle 3 glycoprotein produced in a P pastoris alg3 deletion mutant showing that 
the predominant N-glycans are GlcNAcMan 3 GicNAc 2 and GlcNAcMan4GlcNAc 2 . 
[0045] Figure 19 is a MALDI-TOF-MS analysis of N-glycans isolated from a 
kringle 3 glycoprotein produced in a P. pastoris alg3 deletion mutant treated with 

30 0:1,2 mannosidase, showing that the GlcNAcMan4GlcNAc 2 of Fig. 18 is converted 
to GlcNAcMan 3 GlcNAc 2 . 
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[0046] Figure 20 is a MALDI-TOF-MS analysis of N-glycans of Fig. 19 treated 
with /^N-hexosaminidase (peak corresponding to Man3GlcNAc2) to confirm that 
theN-glycan of Fig. 19 is GlcNAcMan 3 GlcNAc 2 . 

[0047] Figure 21 is a MAIJDI-TOF-MS analysis of N-glycans isolated from a 
5 kringle 3 glycoprotein produced in a P.pastoris alg3 deletion mutant treated with 
al,2 mannosidase and GnTII, showing that the GlcNAcMan 3 GlcNAc 2 of Fig. 19 is 
converted to GlcNAc2Man 3 GlcNAc 2 . 

[0048] Figure 22 is a MALDI-TOF-MS analysis of N-glycans of Fig. 21 treated 
with j3-N-hexosaminidase (peak corresponding to Man3GlcNAc 2 ) to confirm that 

10 the N-glycan of Fig. 21 is GlcNAc 2 Man 3 GlcNAc 2 . 

[0049] Figure 23 is a MALDI-TOF-MS analysis of N-glycans isolated from a 
kringle 3 glycoprotein produced in & P.pastoris alg3 deletion mutant treated with 
al,2 mannosidase and GnTII in the presence of UDP-galactose and 01,4- 
galactosyltransferase, showing that the GlcNAc 2 Man 3 GlcNAc 2 of Fig. 21 is 

15 converted to Gal 2 GlcNAc 2 Man 3 GlcNAc 2 . 

[0050] Figure 24 is a MALDI-TOF-MS analysis of N-glycans isolated from a 
kringle 3 glycoprotein produced in a P.pastoris alg3 deletion mutant treated with 
al,2 mannosidase and GnTII in the presence of UDP-galactose and 01,4- 
galactosyitransferase, and further treated with CMP-N-acetylneuraminic acid and 

20 sialyltransferase, showing that the Gal 2 GlcNAc 2 Man 3 GlcNAc 2 is converted to 
NANA 2 Gal 2 GlcNAc 2 Man 3 GlcNAc z 

[0051] Figure 25 shows S. cerevisiae Alg6 and Alg 6p Sequences 
[0052] Figure 26 shows P. pastoris Alg6 and Alg 6p Sequences 
[0053] Figure 27 shows P. pastoris Alg 6 Sequence Comparisons (Blast) 

25 [0054] Figure 28shows K lactis Alg6 and Alg 6p Sequences 

[0055] Figure 29 shows K. lactis Alg 6 Sequence Comparisons (Blast) 
[0056] Figure 30 Model of an IgG immunoglobulin. Heavy chain and light 
chain can be, based on similar secondary and tertiary structure, subdivided into 
domains. The two heavy chains (domains Vr, ChI, C H 2 and C H 3) are linked 

30 through three disulfide bridges. The light chains (domains Vl and Cl) are linked by 
another disulfide bridge to the ChI portion of the heavy chain and, together with 
the ChI and Vh fragments, make up the Fab region. Antigens bind to the terminal 
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portion of the Fab region. Effector-functions, such as Fc-gamma-Receptor binding 
have been localized to the C H 2 domain, just downstream of the hinge region and 
are influenced by N-glycosylation of asparagine 297 in the heavy chain. 
[0057] Figure 31 Schematic overview of a modular IgGl expression vector. 
5 [0058] Figure 32 shows M musculis GnTHI Nucleic Acid And Amino Acid 
Sequences 

[0059] Figure 33 shows K sapiens G/ir/FNucleic Acid And Amino Acid 
Sequences 

[0060] Figure 34 shows M. musculis GnT FNucleic Acid And Amino Acid 
10 Sequences 

DETAILED DESCRIPTION OF THE INVENTION 

[0061] Unless otherwise defined herein, scientific and technical terms used in 
connection with the present invention shall have the meanings that are commonly 

15 understood by those of ordinary skill in the art Further, unless otherwise required 
by context, singular terms shall include pluralities and plural terms shall include 
the singular. The methods and techniques of the present invention are generally 
performed according to conventional methods well known in the art. Generally, 
nomenclatures used in connection with, and techniques of biochemistry, 

20 enzymology, molecular and cellular biology, microbiology, genetics and protein 
and nucleic acid chemistry and hybridization described herein are those well 
known and commonly used in the art. The methods and techniques of the present 
invention are generally performed according to conventional methods well known 
in the art and as described in various general and more specific references that are 

25 cited and discussed throughout the present specification unless otherwise indicated, 
gee, e.g., Sambrook et al. Molecular Cloning: A Laboratory Manual, 2d ed., Cold 
Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989); Ausubel et at, 
Current Protocols in Molecular Biology, Greene Publishing Associates (1992, and 
Supplements to 2002); Harlow and Lane Antibodies: A Laboratory Manual Cold 

30 Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1990); Introduction to 
Glycobiology, Maureen E. Taylor, Kurt Drickamer, Oxford Univ. Press (2003); 
Worthington En2yme Manual, Worthington Biochemical Corp. Freehold, NJ; 
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Handbook of Biochemistry: Section A Proteins Vol 1 1976 CRC Press; Handbook 
of Biochemisixy. Section A Proteins Vol H 1976 CRC Press; Essentials of 
Glycobiology, Cold Spring Harbor Laboratory Press (1999). The nomenclatures 
used in connection with, and the laboratory procedures and techniques of, 
5 biochemistry and molecular biology described herein are those well known and 
commonly used in the art. 

[0062] All publications, patents and other references mentioned herein are 
incorporated by reference. 

[0063] The following terms, unless olherwise indicated, shall be understood to 

1 0 have the following meanings: 

[00641 As used herein, the term "N-glycan" refers to an N-linked 
oligosaccharide, e.g., one that is attached by an asparagme-N-acetylglucosamine 
linkage to an asparagine residue of a polypeptide. N-glycans have a common 
pentasaccharide core of Man 3 GlcNAc 2 ("Man" refers to mannose; "Glc" refers to 

15 glucose; and "NAc" refers to N-acetyl; GlcNAc refers to N-acetylglucosamine). 
N-glycans differ with respect to the number of branches (antennae) comprising 
peripheral sugars (e.g., fucose and sialic acid) that are added to the Man 3 GlcNAc 2 
("Man3") core structure. N-glycans are classified according to their branched 
constituents (e.g., high mannose, complex or hybrid). A "high mannose" type N- 

20 glycan has five or more mannose residues. A "complex" type N-glycan typically 
has at least one GlcNAc attached to the 1,3 mannose arm and at least one GlcNAc 
attached to the 1,6 mannose arm of a "trimannose" core. The "trimannose core" is 
the pentasaccharide core having a Man3 structure. Complex N-glycans may also 
have galactose ("Gal") residues that are optionally modified with sialic acid or 

25 derivatives ("NeuAc", where "Neu" refers to neuranunic acid and "Ac" refers to 
acetyl). Complex N-glycans may also have intrachain substitutions comprising 
"bisecting" GlcNAc and core fucose ("Fuc"). A "hybrid" N-glycan has at least 
one GlcNAc on the terminal of the 1,3 mannose arm of the trimannose core and 
zero or more mannoses on the 1,6 mannose arm of the trimannose core. 

30 [0065] Abbreviations used herein are of common usage in the art, see, e.g., 

abbreviations of sugars, above. Other common abbreviations include 'TNGase", 
which refers to peptide N-glycosidase F (EC 3.2.2.18); "GlcNAc Tr (I - IH)", 
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which refers to one of three N-acetylglucosaminyltransferase enzymes; "NANA" 
refers to N-acetylneuramiiric acid. 

[0066] As used herein, the term "secretion pathway" refers to the assembly line 
of various glycosyiation enzymes to which a lipid-linked oligosaccharide precursor 
5 and an N-glycan substrate are sequentially exposed, following the molecular flow 
of a nascent polypeptide chain from the cytoplasm to the endoplasmic reticulum 
(ER) and the compartments of the Golgi apparatus. Enzymes are said to be 
localized along this pathway. An enzyme X that acts on a lipid-lihked glycan or an 
N-glycan before enzyme Y is said to be or to act "upstream" to enzyme Y; 
10 similarly, enzyme Y is or acts "downstream" from enzyme X. 

[0067] As used herein, the term "alg X activity" refers to the enzymatic activity 
encoded by the "alg X" gene, and to an enzyme having that enzymatic activity 
encoded by a homologous gene or gene product (see below) or by an unrelated 
gene or gene product 

1 5 [0068] As used herein, the term "antibody" refers to a full antibody (consisting 
of two heavy chains and two light chains) or a fragment thereof. Such fragments 
include, but are not limited to, those produced by digestion with various proteases, 
those produced by chemical cleavage and/or chemical dissociation, and those 
produced recombinantly, so long as the fragment remains capable of specific 

20 binding to an antigen. Among these fragments are Fab, Fab', F(ab')2, and single 
chain Fv (scFv) fragments. Within the scope of the term "antibody" are also 
antibodies that have been modified in sequence, but remain capable of specific 
binding to an antigen. Example of modified antibodies are interspecies chimeric 
and humanized antibodies; antibody fusions; and heteromeric antibody complexes, 

25 such as diabodies (bispecific antibodies), single-chain diabodies, and intrabodies 
(see, e.g., Marasco (ed.), Intracellular Antibodies: Research and Disease 
Applications, Springer-Verlag New York, Inc. (1998) (ISBN: 3540641513), the 
disclosure of which is incorporated herein by reference in its entirety). 
[0069] As used herein, the term ''mutation" refers to any change in the nucleic 

30 acid or amino acid sequence of a gene product, e.g., of a glycosylation-related 
enzyme. - 



WO 03/056914 PCT/US02/41510 



[0070] The term "polynucleotide" or "nucleic acid molecule" refers to a 
polymeric form of nucleotides of at least 10 bases in length The term includes 
DNA molecules (e.g., cDNA or genomic or synthetic DNA) and RNA molecules 
(e.g., roRNA or synthetic RNA), as well as analogs of DNA or RNA containing 
5 non-natural nucleotide analogs, non-native internucleoside bonds, or both- The 
nucleic acid can be in any topological conformation. For instance, the nucleic acid 
can be single-stranded, double-stranded, triple-stranded, quadruplexed, partially 
double-stranded, branched, hairpinned, circular, or in a padlocked conformation. 
The term includes single and double stranded forms of DNA. 

10 [0071] Unless otherwise indicated, a "nucleic acid comprising SEQ ED NO:X" 
refers to a nucleic acid, at least a portion of which has either (i) the sequence of 
SEQ ID NO:X, or (ii) a sequence complementary to SEQ ID NO:X. The choice 
between the two is dictated by the context For instance, if the nucleic acid is used 
as a probe, the choice between the two is dictated by the requirement that the probe 

15 be complementary to the desired target. 

[0072] An "isolated" or "substantially pure" nucleic acid or polynucleotide (e.g., 
an RNA, DNA or a mixed polymer) is one which is substantially separated from 
other cellular components that naturally accompany the native polynucleotide in its 
natural host cell, e.g., ribosomes, polymerases, and genomic sequences with which 

20 it is naturally associated. The term embraces a nucleic acid or polynucleotide that 
(1) has been removed from its naturally occurring environment, (2) is not 
associated with all or a portion of a polynucleotide in which the "isolated 
polynucleotide" is found in nature, (3) is operatively linked to a polynucleotide 
which it is not linked to in nature, or (4) does not occur in nature. The term 

25 "isolated" or "substantially pure" also can be used in reference to recombinant or 
cloned DNA isolates, chemically synthesized polynucleotide analogs, or 
polynucleotide analogs that are biologically synthesized by heterologous systems. 
[0073] However, "isolated" does not necessarily require that the nucleic acid or 
polynucleotide so described has itself been physically removed from its native 

30 environment For instance, an endogenous nucleic acid sequence in the genome of 
an organism is deemed "isolated" herein if a heterologous sequence (i.e., a 
sequence that is not naturally adjacent to this endogenous nucleic acid sequence) is 
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placed adjacent to the endogenous nucleic acid sequence, such that the expression 
of this endogenous nucleic acid sequence is altered. By way of example, a non- 
native promoter sequence can be substituted (e.g., by homologous recombination) 
for the native promoter of a gene in the genome of a human cell, such that this 
5 gene has an altered expression pattern. This gene would now become "isolated" 
because it is separated from at least some of the sequences that naturally flank it. 
[0074] A nucleic acid is also considered "isolated" if it contains any 
modifications that do not naturally occur to the corresponding nucleic acid in a 
genome. For instance, an endogenous coding sequence is considered "isolated" if 

10 it contains an insertion, deletion or a point mutation introduced artificially, e.g., by 
human intervention. An "isolated nucleic acid" also includes a nucleic acid 
integrated into a host cell chromosome at a heterologous site, a nucleic acid 
construct present as an episome. Moreover, an "isolated nucleic acid" can be 
substantially free of other cellular material, or substantially free of culture medium 

15 when produced by recombinant techniques, or substantially free of chemical 
precursors or other chemicals when chemically synthesized. 
[0075] As used herein, the phrase "degenerate variant" of a reference nucleic 
acid sequence encompasses nucleic acid sequences that can be translated, 
according to the standard genetic code, to provide an amino acid sequence identical 

20 to that translated from the reference nucleic acid sequence. 

[0076] The term "percent sequence identity" or "identical" in the context of 
nucleic acid sequences refers to the residues in the two sequences which are the 
same when aligned for maximum correspondence. The length of sequence identity 
comparison may be over a stretch of at least about nine nucleotides, usually at least 

25 about 20 nucleotides, more usually at least about 24 nucleotides, typically at least 
about 28 nucleotides, more typically at least about 32 nucleotides, and preferably 
at least about 36 or more nucleotides. There are a number of different algorithms 
known in the art which can be used to measure nucleotide sequence identity. For 
instance, polynucleotide sequences can be compared using FASTA, Gap or Bestfit, 

30 which are programs in Wisconsin Package Version 10.0, Genetics Computer 
Group (GCG), Madison, Wisconsin. FASTA provides alignments and percent 
sequence identity of the regions of the best overlap between the query and search 
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sequences (Pearson, 1990, (herein incorporated by reference). For instance, 
percent sequence identity between nucleic acid sequences can be determined using 
FASTA with its default parameters (a word size of 6 and the NOPAM factor for 
the scoring matrix) or using Gap with its default parameters as provided in GCG 

5 Version 6.1, herein incorporated by reference. 

[0077] The term "substantial homology" or "substantial similarity," when 
referring to a nucleic acid or fragment thereof, indicates that, when optimally 
aligned with appropriate nucleotide insertions or deletions with another nucleic 
acid (or its complementary strand), there is nucleotide sequence identity in at least 

10 about 50%, more preferably 60% of the nucleotide bases, usually at least about 
70%, more usually at least about 80%, preferably at least about 90%, and more 
preferably at least about 95%, 96%, 97%, 98% or 99% of the nucleotide bases, as 
measured by any well-known algorithm of sequence identity, such as FASTA, 
BLAST or Gap, as discussed above. 

1 5 [0078] Alternatively, substantial homology or similarity exists when a nucleic 
acid or fragment thereof hybridizes to another nucleic acid, to a strand of another 
nucleic acid, or to the complementary strand thereof, under stringent hybridization 
conditions. "Stringent hybridization conditions" and "stringent wash conditions" 
in the context of nucleic acid hybridization experiments depend upon a number of 

20 different physical parameters. Nucleic acid hybridization will be affected by such 
conditions as salt concentration, temperature, solvents, the base composition of the 
hybridizing species, length of the complementary regions, and the number of 
nucleotide base mismatches between the hybridizing nucleic acids, as will be 
readily appreciated by those skilled in the art. One having ordinary skill in the art 

25 knows how to vary these parameters to achieve a particular stringency of 
hybridization. 

[0079] In general, "stringent hybridization" is performed at about 25°C below the 
thermal melting point (T m ) for the specific DNA hybrid under a particular set of 
conditions. "Stringent washing" is performed at temperatures about 5°C lower 
30 than the T m for the specific DNA hybrid under a particular set of conditions. The 
T m is the temperature at which 50% of the target sequence hybridizes to a perfectly 
matched probe. See Sarnbrook et al., supra, page 9.51, hereby incorporated by 
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reference. For purposes herein, "high stringency conditions" are defined for 
solution phase hybridization as aqueous hybridization (i.e., free of fonnamide) in 
6X SSC (where 20X SSC contains 3.0 M NaCl and 03 M sodium citrate), 1% SDS 
at 65oC for 8-12 hours, followed by two washes in 0.2X SSC, 0.1% SDS at 65oC 
5 for 20 minutes. It will be appreciated by the skilled worker that hybridization at 
65°C will occur at different rates depending on a number of factors including the 
length and percent identity of the sequences which are hybridizing. 
[0080] The nucleic acids (also referred to as polynucleotides) of this invention 
may include both sense and antisense strands of RNA, cDNA, genomic DNA, and 

1 0 synthetic forms and mixed polymers of the above. They may be modified 

chemically or biochemically or may contain non-natural or derivatized nucleotide 
bases, as will be readily appreciated by those of skill in the art. Such modifications 
include, for example, labels, methylation, substitution of one or more of the 
naturally occurring nucleotides with an analog, internucleotide modifications such 

15 as uncharged linkages (e.g., methyl phosphonates, phosphotriesters, 

phosphoramidates, carbamates, etc.), charged linkages (e.g., phosphorothioates, 
phosphorodithioates, etc.), pendent moieties (e.g., polypeptides), intercalated (e.g., 
acridine, psoralen, etc.), chelators, alkylators, and modified linkages (e.g., alpha 
anomeric nucleic acids, etc.) Also included are synthetic molecules that mimic 

20 polynucleotides in their ability to bind to a designated sequence via hydrogen 

bonding and other chemical interactions. Such molecules are known in the art and 
include, for example, those in which peptide linkages substitute for phosphate 
linkages in the backbone of the molecule. 

[0081] The term "mutated" when applied to nucleic acid sequences means that 
25 nucleotides in a nucleic acid sequence may be inserted, deleted or changed 

compared to a reference nucleic acid sequence. A single alteration may be made at 
a locus (a point mutation) or multiple nucleotides may be inserted, deleted or 
changed at a single locus. In addition, one or more alterations may be made at any 
number of loci within a nucleic acid sequence. A nucleic acid sequence may be 
30 mutated by any method known in the art including but not limited to mutagenesis 
techniques such as "error-prone PGR" (a process for performing PCR under 
conditions where the copying fidelity of the DNA polymerase is low, such that a 
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high rate of point mutations is obtained along the entire length of Ihe PCR product 
See, eg.. Leung, D. W, et aL, Technique, 1, pp. 11-15 (1989) and Caldwell, R. C. 
& Joyce G. F., PCR Methods Applic., 2, pp. 28-33 (1992)); and "oligonucleotide- 
directed mutagenesis" (a process which enables the generation of site-specific 
5 mutations in any cloned DNA segment of interest See, e.g., Reidhaar-Olson, J. F. 
& Sauer, R. T., et al., Science, 241, pp. 53-57 (1988)). 

[0082] The term 'Vector" as used herein is intended to refer to a nucleic acid 
molecule capable of transporting another nucleic acid to which it has been linked. 
One type of vector is a "plasmid", which refers to a circular double stranded DNA 

10 loop into which additional DNA segments may be ligated. Other vectors include 
cosmids, bacterial artificial chromosomes (BAC) and yeast artificial chromosomes 
(YAC). Another type of vector is a viral vector, wherein additional DNA segments 
may be ligated into the viral genome (discussed in more detail below). Certain 
vectors are capable of autonomous replication in a host cell into which they are 

15 introduced (e.g., vectors having an origin of replication which functions in the host 
cell). Other vectors can be integrated into the genome of a host cell upon 
introduction into the host cell, and are thereby replicated along with the host 
genome. Moreover, certain preferred vectors are capable of directing Ihe 
expression of genes to which they are operatively linked Such vectors are referred 

20 to herein as "recombinant expression vectors" (or simply, "expression vectors"). 
[0083] "Operatively linked" expression control sequences refers to a linkage in 
which the expression control sequence is contiguous with the gene of interest to 
control the gene of interest, as well as expression control sequences that act in 
trans or at a distance to control the gene of interest 

25 [0084] The term "expression control sequence" as used herein refers to 

polynucleotide sequences which are necessary to affect the expression of coding 
sequences to which they are operatively linked. Expression control sequences are 
sequences which control the transcription, post-transcriptional events and 
translation of nucleic acid sequences. Expression control sequences include 

30 appropriate transcription initiation, termination, promoter and enhancer sequences; 
efficient RNA processing signals such as splicing and polyadenylation signals; 
sequences that stabilize cytoplasmic mRNA; sequences that enhance translation 
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efficiency (e.g., ribosome binding sites); sequences that enhance protein stability, 
and when desired, sequences that enhance protein secretion. The nature of such 
control sequences differs depending upon the host organism; in prokaryotes, such 
control sequences generally include promoter, ribosomal binding site, and 
5 transcription termination sequence. The term "control sequences" is intended to 
include, at a minimum, all components whose presence is essential for expression, 
and can also include additional components whose presence is advantageous, for 
example, leader sequences and fusion partner sequences. 
[0085] The term "recombinant host cell" (or simply "host cell"), as used herein, 

10 is intended to refer to a cell into which a recombinant vector has been introduced. 
It should be understood that such terms are intended to refer not only to the 
particular subject cell but to the progeny of such a cell. Because certain 
modifications may occur in succeeding generations due to either mutation or 
environmental influences, such progeny may not, in fact, be identical to the parent 

15 cell, but are still included within the scope of the term "host cell" as used herein. A 
recombinant host cell may be an isolated cell or cell line grown in culture or may 
be a cell which resides in a living tissue or organism. 

[0086] The term "peptide" as used herein refers to a short polypeptide, e.g., one 
that is typically less than about 50 amino acids long and more typically less than 

20 about 30 amino acids long. The term as used herein encompasses analogs and 
mimetics that mimic structural and thus biological function. 
[0087] The term "polypeptide" encompasses both naturally-occurring and non- 
naturally-occurring proteins, and fragments, mutants, derivatives and analogs 
thereof. A polypeptide may be monomeric or polymeric. Further, a polypeptide 

25 may comprise a number of different domains each of which has one or more 
distinct activities. 

[0088] The term "isolated protein" or "isolated polypeptide" is a protein or 
polypeptide that by virtue of its origin or source of derivation (1) is not associated 
with naturally associated components that accompany it in its native state, (2) 
30 when it exists in a purity not found in nature, where purity can be adjudged with 

respect to the presence of other cellular material (e.g., is free of other proteins from 
the same species) (3) is expressed by a cell from a different species, or (4) does not 
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occur in nature (e.g., it is a fragment of a polypeptide found in nature or it includes 
amino acid analogs or derivatives not found in nature or linkages other than 
standard peptide bonds). Thus, a polypeptide that is chemically synthesized or 
synthesized in a cellular system different from the cell from which it naturally 

5 originates will be "isolated" from its naturally associated components. A 
polypeptide or protein may also be rendered substantially free of naturally 
associated components by isolation, using protein purification techniques well 
known in the art As thus defined, "isolated" does not necessarily require that the 
protein, polypeptide, peptide or oligopeptide so described has been physically 

1 0 removed from its native environment. 

[0089] The term "polypeptide fragment" as used herein refers to a polypeptide 
that has an ammo-terminal and/or carboxy-terminal deletion compared to a full- 
length polypeptide. In a preferred embodiment, the polypeptide fragment is a 
contiguous sequence in which the amino acid sequence of the fragment is identical 

15 to the corresponding positions in the naturaUy-occurring sequence. Fragments 

typically are at least 5, 6, 7, 8, 9 or 10 amino acids long, preferably at least 12, 14, 
16 or 18 amino acids long, more preferably at least 20 amino acids long, more 
preferably at least 25, 30, 35, 40 or 45, amino acids, even more preferably at least 
50 or 60 amino acids long, and even more preferably at least 70 amino acids long. 

20 [0090] A "modified derivative" refers to polypeptides or fragments thereof that 
are substantially homologous in primary structural sequence but which include, 
e.g., in vivo or in vitro chemical and biochemical modifications orwhich 
incorporate amino acids that are not found in the native polypeptide. Such 
modifications include, for example, acetylation, carboxylation, phosphorylation, 

25 glycosylation, ubiquitination, labeling, e.g., with radionuclides, and various 

enzymatic modifications, as will be readily appreciated by those well skilled in the 
art A variety of methods for labeling polypeptides and of substituents or labels 
useful for such purposes are well known in the art, and include radioactive isotopes 
such as n \ 32 P, 35 S, and 3 H, ligands which bind to labeled antiligands (e.g., 

30 antibodies), fluorophores, chemdluminescent agents, enzymes, and antiligands 

which can serve as specific binding pair members for a labeled ligand. The choice 
of label depends on the sensitivity required, ease of conjugation with the primer, 
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stability requirements, and available instrumentation. Methods for labeling 
polypeptides are well known in the art See Ausubel et al., 1992, hereby 
incorporated by reference. 

[0091] The term "fusion protein" refers to a polypeptide comprising a 
5 polypeptide or fragment coupled to heterologous amino acid sequences. Fusion 
proteins are useful because they can be constructed to contain two or more desired 
functional elements from two or more different proteins. A fusion protein 
comprises at least 10 contiguous amino acids from a polypeptide of interest, more 
preferably at least 20 or 30 amino acids, even more preferably at least 40, 50 or 60 

10 amino acids, yet more preferably at least 75, 100 or 125 amino acids. Fusion 
proteins can be produced recombinantly by constructing a nucleic acid sequence 
which encodes the polypeptide or a fragment thereof in frame with a nucleic acid 
sequence encoding a different protein or peptide and then expressing the fusion 
protein. Alternatively, a fusion protein can be produced chemically by 

1 5 crosslinking the polypeptide or a fragment thereof to another protein. 

[0092] The term "non-peptide analog" refers to a compound with properties that 
are analogous to those of a reference polypeptide. A non-peptide compound may 
also be termed a "peptide mimetic" or a "peptidomimetic". See, e.g., Jones, (1992) 
Amino Acid and Peptide Synthesis, Oxford University Press; Jung, (1997) 

20 Combinatorial Peptide and Nonpeptide Libraries: A Handbook John Wiley, 
Bodanszky et al., (1993) Peptide Chemistry--A Practical Textbook, Springer 
Verlag; "Synthetic Peptides: A Users Guide", G. A Grant, Ed, W. H. Freeman and 
Co., 1992; Evans et al. A Med. Chem. 30:1229 (1987); Fauchere, J. Adv. Drug Res. 
15:29 (1986); Veber andFreidinger 3Wo , p.392 (1985); and references sited in 

25 each of the above, which are incorporated herein by reference. Such compounds 
are often developed with the aid of computerized molecular modeling. Peptide 
mimetics that are structurally similar to useful peptides of the invention may be 
used to produce an equivalent effect and are therefore envisioned to be part of the 
invention. 

30 [0093] A "polypeptide mutant" or "mutein" refers to a polypeptide whose 

sequence contains an insertion, duplication, deletion, rearrangement or substitution 
of one or more amino acids compared to the amino acid sequence of a native or 
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wild type protein. A mutein may have one or more amino acid point substitutions, 
in which a single amino acid at a position has been changed to another amino acid, 
one or more insertions and/or deletions, in which one or more amino acids are 
inserted or deleted, respectively, in the sequence of the naturaUy-occurring protein, 

5 and/or truncations of the amino acid sequence at either or both the amino or 
carboxy termini. A mutein may have the same but preferably has a different 
biological activity compared to the naturally-occurring protein. For instance, a 
mutein may have an increased or decreased neuron or NgR binding activity. In a 
preferred embodiment of the present invention, a MAG derivative that is a mutein 

1 0 (e.g., in MAG Ig-like domain 5) has decreased neuronal growth inhibitory activity 
compared to endogenous or soluble wild-type MAG. 

[0094] A mutein has at least 70% overall sequence homology to its wild-type 
counterpart Even more preferred are muteins having 80%, 85% or 90% overall 
sequence homology to the wild-type protein. In an even more preferred 
15 embodiment, a mutein exhibits 95% sequence identity, even more preferably 97%, 
even more preferably 98% and even more preferably 99% overall sequence 
identity. Sequence homology may be measured by any common sequence analysis 
algorithm, such as Gap or Bestfit. 

[0095] Preferred amino acid substitutions are those which: (1) reduce 
20 susceptibility to proteolysis, (2) reduce susceptibility to oxidation, (3) alter binding 
affinity for forming protein complexes, (4) alter binding affinity or enzymatic 
activity, and (5) confer or modify other physicochemical or functional properties of 
such analogs. 

[0096] As used herein, the twenty conventional amino acids and their 
25 abbreviations follow conventional usage. See Immunology - A Synthesis (2 nd 

Edition, E.S. Golub and D.R. Gren, Eds., Sinauer Associates, Sunderland, Mass. 
(1991)), which is incorporated herein by reference. Stereoisomers (e.g., D-amino 
acids) of the twenty conventional amino acids, unnatural amino acids such as a-, 
a-disubstituted amino acids, N-alkyl amino acids, and other unconventional amino 
30 acids may also be suitable components for polypeptides of the present invention. 
Examples of unconventional amino acids include: 4-hydroxyproline, 
^carboxyglutamate, g-N,N,N-trimethyllysine, e-N-acetyllysine, O-phosphoserine, 
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N-acetylseriiie,N-formylme1iiioniiie, 3-meth.ylhistidine, 5-hydroxylysine, 
s-N-memylarginine, and other similar amino acids and irnino acids (e.g., 
4-hydroxyproline). In the polypeptide notation used herein, the left-hand direction 
is the amino terminal direction and the right hand direction is the carboxy-terminal 

5 direction, in accordance with standard usage and convention 

[0097] A protein has "homology" or is "homologous" to a second protein if the 
nucleic acid sequence that encodes the protein has a similar sequence to the nucleic 
acid sequence that encodes the second protein Alternatively, a protein has 
homology to a second protein if the two proteins have "similar" amino acid 

10 sequences. (Thus, the term "homologous proteins" is defined to mean that the two 
proteins have similar amino acid sequences). In a preferred embodiment, a 
homologous protein is one that exhibits 60% sequence homology to Ihe wild type 
protein, more preferred is 70% sequence homology. Even more preferred are 
homologous proteins that exhibit 80%, 85% or 90% sequence homology to the 

1 5 wild type protein In a yet more preferred embodiment, a homologous protein 
exhibits 95%, 97%, 98% or 99% sequence identity. As used herein, homology 
between two regions of amino acid sequence (especially with respect to predicted 
structural similarities) is interpreted as implying similarity in function 
[0098] When "homologous" is used in reference to proteins or peptides, it is 

20 recognized that residue positions that are not identical often differ by conservative 
amino acid substitutions. A "conservative amino acid substitution" is one in which 
an amino acid residue is substituted by another amino acid residue having a side 
chain (R group) with similar chemical properties (e.g., charge or hydrophobicity). 
In general, a conservative amino acid substitution will not substantially change the 

25 functional properties of a protein. In cases where two or more amino acid 
sequences differ from each other by conservative substitutions, the percent 
sequence identity or degree of homology may be adjusted upwards to correct for 
the conservative nature of the substitution Means for making this adjustment are 
well known to those of skill in the art (see, e.g., Pearson et al., 1994, herein 

30 incorporated by reference). 

[0099] The following six groups each contain amino acids that are conservative 
substitutions for one another: 1) Serine (S), Threonine (T); 2) Aspartic Acid (D), 
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Glutamic Acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine 
(K); 5) Isoleucine (I), Leucine (L), Methionine (M), Alanine (A), Valine (V), and 
6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W). 

[0100] Sequence homology for polypeptides, which is also referred to as percent 
5 sequence identity, is typically measured using sequence analysis software. See, 
e.g., the Sequence Analysis Software Package of the Genetics Computer Group 
(GCG), University of Wisconsin Biotechnology Center, 910 University Avenue, 
Madison, Wisconsin 53705. Protein analysis software matches similar sequences 
using measure of homology assigned to various substitutions, deletions and other 
1 0 modifications, mcluding conservative amino acid substitutions. For instance, GCG 
contains programs such as "Gap" and "Bestfif * which can be used with default 
parameters to determine sequence homology or sequence identity between closely 
related polypeptides, such as homologous polypeptides from different species of 
organisms or between a wild type protein and a mutein thereof. See, e.g., GCG 
15 Version 6.1. 

[0101] A preferred algorithm when comparing a inhibitory molecule sequence to 
a database containing a large number of sequences from different organisms is the 
computer program BLAST (Altschul, SJF. et al. (1990) J. Mol. Biol. 215:403-410; 
Gish and States (1993) Nature Genet. 3:266-272; Madden, T.L. et al. (1996) Meth. 
20 Enzymol. 266:131-141; Altschul, S.F. et al. (1997) Nucleic Acids £es.25:3389- 
3402; Zhang, J. and Madden, T.L. (1997) Genome Res. 7:649-656), especially 
blastp or tblastn (Altschul et al., 1997). Preferred parameters for BLASTp are: 
Expectation value: 1 0 (default) 
Filter: seg (default) 

25 Cost to open a gap: 1 1 (default) 

Cost to extend a gap: 1 (default 
Max. alignments: 100 (default) 
Word size: 11 (default) 

No. of descriptions: 100 (default) 
30 Penalty Matrix: BLOWSUM62 

[0102] The length of polypeptide sequences compared for homology will 
generally be at least about 16 amino acid residues, usually at least about 20 
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residues, more usually at least about 24 residues, typically at least about 28 
residues, and preferably more than about 35 residues. When searching a database 
containing sequences ftom a large number of different organisms, it is preferable to 
compare amino acid sequences. Database searching using amino acid sequences 
5 can be measured by algorithms other than blastp known in the art. For instance, 
polypeptide sequences can be compared using FASTA, a program in GCG Version 
6.1. FASTA provides alignments and percent sequence identity of the regions of 
the best overlap between the query and search sequences (Pearson, 1990, herein 
incorporated by reference). For example, percent sequence identity between amino 
10 acid sequences can be determined using FASTA with its default parameters (a 

word size of 2 and the PAM250 scoring matrix), as provided in GCG Version 6.1, 
herein incorporated by reference. 

[0103] "Specific binding" refers to the ability of two molecules to bind to each 
other in preference to binding to other molecules in the environment. Typically, 
15 "specific binding" discriminates over adventitious binding in a reaction by at least 
two-fold, more typically by at least 10-fold, often at least 100-fold. Typically, the 
affinity or avidity of a specific binding reaction is at least about 10-7 M (e.g., at 
least about 10~ 8 M or 10" 9 M). 

[0104] The term "region" as used herein refers to a physically contiguous portion 
20 of the primary structure of a biomolecule. In the case of proteins, a region is 

defined by a contiguous portion of the amino acid sequence of that protein. 

[01 05] The term "domain" as used herein refers to a structure of a biomolecule 

that contributes to a known or suspected function of the biomolecule. Domains 

may be co-extensive with regions or portions thereof; domains may also include 
25 distinct, non-contiguous regions of a biomolecule. Examples of protein domains 

include, but are not limited to, an Ig domain, an extracellular domain, a 

transmembrane domain, and a cytoplasmic domain. 

[01 06] As used herein, the term <f molecule" means any compound, including, but 
not limited to, a small molecule, peptide, protein, sugar, nucleotide, nucleic acid, 
30 lipid, etc., and such a compound can be natural or synthetic. 

[0107] Unless otherwise defined, all technical and scientific terms used herein 
have the same meaning as commonly understood by one of ordinary skill in the art 
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to which this invention pertains. Exemplary methods and materials are described 
below, although methods and materials similar or equivalent to those described 
herein can also be used in the practice of the present invention and will be apparent 
to those of skill in the art All publications and other references mentioned herein 

5 are incorporated by reference in their entirety. In case of conflict, the present 
specification, including definitions, will control. The materials, methods, and 
examples are illustrative only and not intended to be limiting. 
[0108] Throughout this specification and claims, the word "comprise" or 
variations such as "comprises" or "comprising", will be understood to imply the 

10 inclusion of a stated integer or group of integers but not the exclusion of any other 
integer or group of integers. 

Engineering or Selecting Hosts With Modified Lipid-Linked Oligosaccharides 
For The Generation of Human-like N-GIycans 

1 5 [0109] The invention provides a method for producing a human-like glycoprotein 
in a non-human eukaryotic host cell. The method involves making or using a non- 
human eukaryotic host cell diminished or depleted in an alg gene activity (i.e., alg 
activities, including equivalent enzymatic activities in non-fungal host cells) and 
introducing into the host cell at least one glycosidase activity. In a preferred 

20 embodiment, the glycosidase activity is introduced by causing expression of one or 
more mannosidase activities within the host cell, for example, by activation of a 
mannosidase activity, or by expression from a nucleic acid molecule of a 
mannosidase activity, in the host cell. 

[0110] In another embodiment, the method involves making or using a host cell 
25 diminished or depleted in the activity of one or more enzymes that transfer a sugar 
residue to the 1,6 arm of hpid-linked oligosaccharide precursors (Fig. 1). A host 
cell of the invention is selected for or is engineered by introducing a mutation in 
one or more of the genes encoding an enzyme that transfers a sugar residue (e.g., 
mannosylates) the 1,6 arm of a Hpid-linked oligosaccharide precursor. The sugar 
30 residue is more preferably mannose, is preferably a glucose, GlcNAc, galactose, 
sialic acid, fucose or GlcNAc phosphate residue. In a preferred embodiment, the 
activity of one or more enzymes that mannosylate the 1,6 arm of lipid-linked 
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oligosaccharide precursors is diminished or depleted. The method may further 
comprise the step of introducing into the host cell at least one glycosidase activity 
(see below). 

[0111] In yet another embodiment, the invention provides a method for 
5 producing a human-like glycoprotein in a non-human host, wherein the 

glycoprotein comprises an N-glycan having at least two GlcNAcs attached to a 
trimaimose core structure. 

[0112] In each above embodiment, the method is directed to making a host cell 
in which the lipid-linked oligosaccharide precursors are enriched in ManxGlcNAc 2 

10 structures, where X is 3, 4 or 5 (Fig. 2). These structures are transferred in the ER 
of the host cell onto nascent polypeptide chains by an ohgosaccharyl-transferase 
and may then be processed by treatment with glycosidases (e.g., a-mannosidases) 
and glycosyltransferases (e.g., GnTl) to produce N-glycans having 
GlcNAcMan x GlcNAc 2 core structures, wherein X is 3, 4 or 5, and is preferably 3 

15 (Figs. 2 and 3). As shown in Fig. 2, N-glycans having a GlcNAcMan x GlcNAc 2 
core structure where X is greater than 3 may be converted to 
GlcNAcMan 3 GlcNAc 2 , e.g., by treatment with an a-1,3 and/or a-1,2-1,3 
mannosidase activity, where applicable. 

[0113] Additional processing of GlcNAcMan 3 GlcNAc 2 by treatment with 
20 glycosyltransferases (e.g., GnTO) produces GlcNAc 2 Man 3 GlcNAc 2 core structures 
which may then be modified, as desired, e.g., by ex vivo treatment or by 
heterologous expression in the host cell of a set of glycosylation enzymes, 
including glycosyltransferases, sugar transporters and mannosidases (see below), 
to become human-like N-glycans. Preferred human-like glycoproteins which may 
25 be produced according to the invention include those which comprise N-glycans 
having seven or fewer, or three or fewer, mannose residues; comprise one or more 
sugars selected from the group consisting of galactose, GlcNAc, sialic acid, and 
fucose; and comprise at least one oligosaccharide branch comprising the structure 
NeuNAc-Gal-GlcNAc-Man. 
30 [0114] In one embodiment, the host cell has diminished or depleted Dol-P- 
Man:Man 5 GlcNAc 2 -PP-Dol Mannosyltransferase activity, which is an activity 
involved in the first mannosylation step from Man 5 GlcNAc 2 -PP-Dol to 
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MansGlcNAc 2 -PP-Dol at the luminal side of the ER (e.g., ALG3 Fig. 1; Fig. 2). In 
S.cerevisiae, this enzyme is encoded by the ALG3 gene. As described above, 
S.cerevisiae cells harboring a leaky alg3-l mutation accumulate Man 5 GlcNAc 2 - 
PP-Dol and cells having a deletion in alg3 appear to transfer Man 5 GlcNAc 2 

5 structures onto nascent polypeptide chains within the ER. Accordingly, in this 
embodiment, host cells will accumulate N-glycans enriched in MansGlcNAc 2 
structures which can then be converted to GlcNAc 2 Man 3 GlcNAc 2 by treatment 
with glycosidases (e.g., with a-1,2 mannosidase, ol,3 mannosidase or a-1,2-1,3 
mannosidase activities (Fig. 2). 

10 [0115] As described in Example 1, degenerate primers were designed based on 
an alignment of Alg3 protein sequences from S. cerevisiae, D. melanogaster and 
humans (H. sapiens) (Figs. 4 and 5), and were used to amplify a product from P. 
pastoris genomic DNA The resulting PCR product was used as a probe to identify 
and isolate a P. pastoris genomic clone comprising an open reading frame (ORF) 

1 5 that encodes a protein having 35% overall sequence identity and 53% sequence 
similarity to the S. cerevisiae ALG3 gene (Figs. 6 and 7). This P. pastoris gene is 
referred to herein as "PpALG3". The ALG3 gene was similarly identified and 
isolated from K. lactis (Example 1; Figs. 8 and 9). 

[0116] Thus, in another embodiment, the invention provides an isolated nucleic 
20 acid molecule having a nucleic acid sequence comprising or consisting of at least 
forty-five, preferably at least 50, more preferably at least 60 and most preferably 
75 or more nucleotide residues of the P. pastoris ALG 3gene (Fig. 6) and the K. 
lactis ALG 3gene (Fig. 8), and homologs, variants and derivatives thereof. The 
invention also provides nucleic acid molecules that hybridize under stringent 
25 conditions to the above-described nucleic acid molecules. Similarly, isolated 
polypeptides (including muteins, allelic variants, fragments, derivatives, and 
analogs) encoded by the nucleic acid molecules of the invention are provided 
(jP.pastoris and AT. lactis ALG 3gene products are shown in Fig. 6 and 8). In 
addition, also provided are vectors, including expression vectors, which comprise a 
30 nucleic acid molecule of the invention, as described further herein. 

[0117] Using gene-specific primers, a construct was made to delete the PpALG3 
gene from the genome of P. pastoris (Example 1). This strain was used to 



31 



WO 03/056914 



PCT/US02/41510 



generate a host cell depleted in Dol-P-Man:Man 5 GlcNAc2-PP-Dol 
Mannosyltransferase activity and produce lipid-linked Man 5 GlcNAc 2 -PP-Dol 
precursors which are transferred onto nascent polypeptide chains to produce N- 
glycans having a Man 5 GlcNAc 2 carbohydrate structure. 

5 [0118] As described in Example 2, such a host cell may be engineered by 

expression of appropriate mannosidases to produce N-glycans having the desired 
Man 3 GlcNAc 2 core carbohydrate structure. Expression of GnTs in the host cell 
(e.g., by targeting a nucleic acid molecule or a library of nucleic acid molecules as 
described below) enables the modified host cell to produce N-glycans having one 

10 or two GlcNAc structures attached to each arm of the Man3 core structure (i.e., 
GlcNAdMansGlcNAcz or GlcNAc 2 Man 3 GlcNAc 2 ; see Fig. 3). These structures 
may be processed further using the methods of the invention to produce human- 
like N-glycans on proteins which enter the secretion pathway of the host cell. 
[0119] In another embodiment, the host cell has dinunished or depleted dolichyl- 

15 P-Man:Man6GlcNAc2-PP-dohchyl a-1,2 mannosyltransferase activity, which is an 
a-1,2 mannosyltransferase activity involved in the mannosylation step converting 
MangGlcNAcz-PP-Dol to Man 7 GlcNAc 2 -PP-Dol at the luminal side of the ER (see 
above and Figs. 1 and 2). In S.cerevisiae, this enzyme is encoded by the ALG9 
gene. Cells harboring an alg9 mutation accumulate Mau6GlcNAc 2 -PP-Dol (Fig. 2) 

20 and transfer MansGlcNAc^ structures onto nascent polypeptide chains within the 
ER. Accordingly, in this embodiment, host cells will accumulate N-glycans 
enriched in Man6GlcNAc 2 structures which can then be processed down to core 
Man3 structures by treatment with a-1,2 and 05-1,3 mannosidases (see Fig. 3 and 
Examples 3 and 4). 

25 [0120] A host cell in which the alg9 gene (or gene encoding an equivalent 

activity) has been deleted is constructed (see, e.g., Example 3). Deletion of ALG9 
(piALG12; see below) creates a host cell which produces N-glycans with one or 
two additional mannoses, respectively, on the 1,6 arm (Fig. 2). In order to make 
the 1,6 core-mannose accessible to N-acetylglucosaminyltransferase II (GnTIT) 

30 these mannoses have to be removed by glycosidase(s). ER mannosidase typically 
will remove the terminal 1,2 mannose on the 1,6 arm and subsequently 
Mannosidase H (alpha 1-3,6 mannosidase) or other mannosidases such as alpha 



( '. ( 

WO 03/056914 PCT/US02/41510 

1,2, alphal,3 or alpha 1-2,3 mannosidases (e.g., from Xanthomonas manihotis; see 
Example 4) can act upon the 1,6 arm and subsequently GnTII can transfer anN- 
acetylglucosamine, resulting in GlcNAc 2 Man3 (Fig. 2). 

[0121] The resulting host cell, which is depleted for alg9p activity, is engineered 

5 to express a- 1,2 and a-1 ,3 mannosidase activity (from one or more enzymes, and 
preferably, by expression from a nucleic acid molecule introduced into the host cell 
and which expresses an enzyme targeted to a preferred subcellular compartment 
(see below). Example 4 describes the cloning and expression of one such enzyme 
from Xanthomonas manihotis. 

10 [0122] In another embodiment, the host cell has diminished or depleted dohchyl- 
P-Man:Man7GlcNAc2-PP-dohchyl a-1,6 mannosyltransferase activity, which is an 
a-1 ,6 mannosyltransferase activity involved in the mannosylation step converting 
Man 7 GlcNAc 2 -PP-Dol to MangGlcNAcz-PP-Dol (which mannosylates Ihe Q!-l,6 
mannose on the 1,6 arm of the core mannose structure) at the luminal side of the 

15 ER (see above and Figs. 1 and 2). In S. cerevisiae, this enzyme is encoded by the 
ALG12 gene. Cells harboring an algl2 mutation accumulate Man 7 GlcNAc 2 -PP- 
Dol (Fig- 2) and transfer Man 7 GlcNAc 2 structures onto nascent polypeptide chains 
within the ER. Accordingly, in this embodiment, host cells will accumulate N- 
glycans enriched in Man 7 GlcNAc 2 structures which can then be processed down to 

20 core Man3 structures by treatment with a-1,2 and a-1,3 mannosidases (see Fig. 3 
and Examples 3 and 4). 

[0123] As described above for alg9 mutant hosts, the resulting host cell, which is 
depleted for algl2p activity, is engineered to express o>l,2 and a-1,3 mannosidase 
activity (e.g., from one or more enzymes, and preferably, by expression from one 
25 or more nucleic acid molecules introduced into the host cell and which express an 
enzyme activity which is targeted to a preferred subcellular compartment (see 
below). 
[0124] 

Engineering or Selecting Hosts Optionally Having Decreased Initiating 
30 a-1,6 Mannosyltransferase Activity 

[0125] In a preferred embodiment, the method of the invention involves making 
or using a host cell which is both (a) diminished or depleted in the activity of an 
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alg gene or in one or more activities that mannosylate N-glycans on the a-1,6 arm 
of the Man 3 GlcNAc 2 ('Man3'') core carbohydrate structure; and (b) diininished or 
depleted in the activity of an initiating a-l s 6-mannosyltransferase, i.e., an initiation 
specific enzyme that initiates outer chain mannosylation (on the a-1,3 arm of the 
5 Man3 cores structure). Iu S.cerevisiae, this enzyme is encoded by the OCH1 gene. 
Disruption of the ochl gene in S.cerevisiae results in a phenotype in which N- 
linked sugars completely lack the poly-mannose outer chain. Previous approaches 
for obtaining mammalian-type glycosylation in fungal strains have required 
inactivation of OCH1 (see, e.g., Chiba, 1998). Disruption of the initiating oc-1,6- 

10 mannosyltransferase activity in a host cell of the invention is optional, however 
(depending on the selected host cell), as the Ochlp enzyme requires an intact 
MangGlcNAc for efficient mannose outer chain initiation. Thus, the host cells 
selected or produced according to this invention, which accumulate lipid-linked 
oligosaccharides having seven or fewer mannose residues will, after transfer, 

15 produce hypoglycosylated N-glycans that will likely be poor substrates for Ochlp 
(see, e.g., Nakayama, 1997). 

Engineering or Selecting Hosts Having Increased Glucosyltransferase Activity 
[0126] As discussed above, glucosylated oligosaccharides are thought to be 

20 transferred to nascent polypeptide chains at a much higher rate than their 
nonglucosylated counterparts. It appears that substrate recognition by the 
ohgosaccharyltransferase complex is enhanced by addition of glucose to the 
antennae of kpid-linked ohgosaccharides. It is thus desirable to create or select 
host cells capable of optimal glucosylation of the lipid-linked oligosaccharides. In 

25 such host cells, underglycosylation will be substantially decreased or even 
abolished, due to a faster and more efficient transfer of glucosylated Man 5 
structures onto the nascent polypeptide chain. 

[0127] Accordingly, in another embodiment of the invention, the method is 
directed to making a host cell in which the lipid-linked N-glycan precursors are 
30 transferred efficiently to the nascent polypeptide chain in the ER. In a preferred 
embodiment, transfer is augmented by increasing the level of glucosylation on the 
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branches of Upid-Unked oUgosaccharides which, in turn, will make them better 
substrates for oligosaccharyitransferase. 

[0128] In one preferred embodiment, the invention provides a method for making 
a human-like glycoprotein which uses a host cell in which one or more enzymes 

5 responsible for glucosylation of Upid-linked oUgosaccharides in the ERhas 

increased activity. One way to enhance the degree of glucosylation of the Upid- 
linked oUgosaccharides is to overexpress one or more enzymes responsible for the 
transfer of glucose residues onto the antennae of the Upid-linked oUgosaccharide. 
In particular, increasing a-1,3 glucosyltransferase activity will increase the amount 

1 0 of glucosylated Upid-linked Man 5 structures and wiU reduce or eliminate the 

underglycosylation of secreted proteins. In S.cerevisiae, this enzyme is encoded 
by the ALG6 gene. 

[0129] Saccharomyces cerevisiae ALG6 and its human counterpart have been 
cloned (Imbach, 1999; Reiss, 1996). Due to the evolutionary conservation of the 

1 5 early steps of glycosylation, ALG6 loci are expected to be homologous between 
species and may be cloned based on sequence similarities by anyone skffled in the 
art (The same holds true for cloning and identification of ALG8 and ALG1 0 loci 
from different species.) In addition, different glucosyltransferases from different 
species can men be tested to identify the ones with optimal activities. 

20 [0130] The introduction of additional copies of an ALG6 gene and/or the 

expression of ALG6 under the control of a strong promoter, such as the GAPDH 
promoter, is one of several ways to increase the degree of glucosylated Upid-Unked 
oUgosaccharides. The ALG6 gene from P. pastoris is cloned and expressed 
(Example 5). ALG6 nucleic acid and amino acid sequences are show in Fig. 25 {S. 

25 cerevisiae) and Fig. 26 (P. pastoris). These sequences are compared to other 
eukaryotic ALG6 sequences in Fig. 27. 

[0131] Accordingly, another embodiment of the invention provides a method to 
enhance the degree of glucosylation of Upid-Unked oUgosaccharides comprising 
the step of increasing alpha-1,3 glucosyltransferase activity in a host ceU. The 
30 increase in activity may be achieved by overexpression of nucleic acid sequences 
encoding the activity, e.g., by operatively linking the nucleic acid encoding the 
activity with one or more heterologous expression control sequences. Preferred 
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expression control sequences include transcription initiation, termination, promoter 
and enhancer sequences; RNA splice donor and polyadenylation signals; mRNA 
stabilizing sequences; ribosome binding sites; protein stabilizing sequences; and 
protein secretion sequences. 
5 [0132] In another embodiment, the increase in alpha-1 ,3 glucosyltransferase 

activity is achieved by introducing a nucleic acid molecule encoding the activity on 
a multi-copy plasmid, using techniques well known to the skilled worker. In yet 
another embodiment, the degree of glucosylation of hpid-linked oligosaccharides 
comprising decreasing the substrate specificity of oligosaccharyl transferase 

10 activity in a host cell. This is achieved by, for example, subjecting at least one 
nucleic acid encoding the activity to a technique such as gene shuffling, in vitro 
mutagenesis, and error-prone polymerase chain reaction, all of which are well- 
known to one of skill in the art. Naturally, ALG8 and ALG1 0 can be 
overexpressed in a host cell and tested in a similar fashion. 

1 5 [0133] Accordingly, in a preferred embodiment, the invention provides a method 
for m aking a human-like glycoprotein using a host cell which is engineered or 
selected so that one or more enzymes responsible for glucosylation of hpid-linked 
oligosaccharides in the ER has increased activity. In a more preferred 
embodiment, the invention uses a host cell having both (a) diminished or depleted 

20 in the activity of one or more alg gene activities or activities that mannosylate N- 
glycans on the ct-1,6 aim of the Man 3 GlcNAc 2 ( <c Man3") core carbohydrate 
structure and (b) engineered or selected so that one or more enzymes responsible 
for glucosylation of lipid-linked oligosaccharides in the ERhas increased activity. 
The hpid-linked Man 5 structure found in an alg3 mutant background, however, is 

25 not a preferred substrate for Alg6p. Accordingly, the skilled worker may identify 
Alg6p, Alg8p and AlglOp with an increased substrate specificity (Gibbs, 2001) 
e.g., by subjecting nucleic acids encoding such enzymes to one or more rounds of 
gene shu fflin g, error prone PCR, or in vitro mutagenesis approaches and selecting 
for increased substrate specificity in a host cell of interest, using molecular biology 

30 and genetic selection techniques well known to those of skill in the art. It will be 
appreciated by the skilled worker that such techniques for improving enzyme 
substrate specificities in a selected host strain are not limited to this particular 
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embodiment of the invention but rather, may be used in any embodiment to 
optimize further the production of human-like N-glycans in a non-human host cell. 
[0134] As described, once Man 5 is transferred onto the nascent polypeptide 
chain, expression of suitable a-U-mannosidase(s), as provided by the present 

5 invention, will further trim Man 5 GlcNAc 2 structures to yield the desired core 

Man 3 GlcNAc 2 structures. a-l,2-mannosidases remove only terminal a-l,2-linked 
mannose residues and are expected to recognize the Man 5 GlcNAc 2 - 
Man 7 GlcNAc 2 specific structures made in alg3, 9 and 12 mutant host cells and in 
host cells in which homologs to these genes are mutated. 

10 [0135] As schematically presented in Figure 3, co-expression of appropriate 

UDP-sugar-transporter(s) and -transferase^) will cap the terminal oc-1,6 and a-1,3 
residues with GlcNAc, resulting in the necessary precursor for mammalian-type 
complex and hybrid N-glycosylation: GlcNAc 2 Man s GlcNAc 2 . The peptide-bound 
N-linked ohgosaccharide chain GlcNAc 2 Man 3 GlcNAc 2 (Figure 3) then serves as a 

1 5 precursor for further modification to a mammalian-type ohgosaccharide structure. 
Subsequent expression of galactosyl-tranferases and genetically engineering the 
capacity to transfer sialylic acid will produce a mammahan-type (e.g., human-like) 
N-glycan structure. 

[0136] A desired host cell according to the invention can be engineered one 
20 enzyme or more than one enzyme at a time. In addition, a library of genes 

encoding potentially useful enzymes can be created, and a strain having one or 
more enzymes with optimal activities or producing the most "human-like" 
glycoproteins, selected by transforming target host cells with one or more members 
of the library. Lower eukaryotes that are able to produce glycoproteins having the 
25 core JV-glycan Man 3 GlcNAc 2 are particularly useful because of the ease of 
performing genetic manipulations, and safety and efficiency features. In a 
preferred embodiment, at least one further glycosylation reaction is performed, ex 
vivo or in vivo, to produce a human-like N-glycan. In a more preferred 
embodiment, active forms of glycosylating enzymes are expressed in the 
30 endoplasmic reticulum and/or Golgi apparatus of the host cell to produce the 
desired human-like glycoprotein 
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Host Cells 

[0137] A preferred non-human host cell of the invention is a lower eukaryotic 
cell, e.g., a unicellular or filamentous fungus, which is diminished or depleted in 
the activity of one or more alg gene activities (including an enzymatic activity 
5 which is a homolog or equivalent to an alg activity). Another preferred host cell of 
the invention is diminished or depleted in the activity of one or more enzymes 
(other than alg activities) that mannosylate the a-1,6 arm of a hpid-linked 
oligosaccharide structure. 

[0138] While lower eukaryotic host cells are preferred, a wide variety of host 
10 cells having the aforementioned properties are envisioned as being useful in the 

methods of the invention. Plant cells, for instance, may be engineered to express a 
human-like glycoprotein according to the invention. Likewise, a variety of non- 
human, mammalian host cells may be altered to express more human-like 
glycoproteins using the methods of the invention An appropriate host cell can be 
1 5 engineered, or one of the many such mutants already described in yeasts may be 
used. A preferred host cell of the invention, as exemplified herein, is a 
hypermannosylation-minus (OCH1) mutant in Pichia pastoris which has further 
been modified to delete the alg3 gene. Other preferred hosts are Pichia pastoris 
mutants having ochl and alg 9 or algl2 mutations. 

20 

Formation of complex N-glycans 

[0139] The sequential addition of sugars to the modified, nascent N-glycan 
structure involves the successful targeting of glucosyltransferases into the Golgj 
apparatus and their successful expression. This process requires the functional 
25 expression, e.g., of GnT I, in the early or medial Golgi apparatus as well as 
ensuring a sufficient supply of UDP-GlcNAc (e.g., by expression of aUDP- 
GlcNAc transporter). 

[0140] To characterize the glycoproteins and to confirm the desired 
glycosylation, the glycoproteins were purified, the N-glycans were PNGase-F 
30 released and then analyzed by MALDI-TOF-MS (Example 2). Kringle 3 domain 
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of human plasminogen was used as the reporter protein. This soluble glycoprotein 
was produced in P. pastoris in an alg3, ochl knockout background (Example 2). 
[0141] GlcNAcMan5GlcNAc 2 was produced as the predominant N-glycan after 
addition of human GnT I, and K. lactis UDP-GlcNAc transporter in Fig. 16 

5 (Example 2). The mass of this N-glycan is consistent with the mass of 

GlcNAcMan 5 GlcNAc 2 at 1463 (m/z). To confirm the addition of the GlcNAc onto 
Man 5 GlcNAc 2 , a /^-hexosaminidase digest was performed, which revealed a 
peak at 1260 (m/z), consistent with the mass of Man 5 GlcNAc 2 (Fig.17). 
[0142] The N-glycans from the alg3 ochl deletion in one strain PBP3 (Example 

10 2) provided two distinct peaks at 1 138 (m/z) and 1300 (m/z), which is consistent 
with structures GlcNAcMan 3 GlcNAc 2 and GlcNAcMan4GlcNAc 2 (Fig. 18). After 
an in vitro al,2-mannosidase digestion for redundant mannoses, a peak eluted at 
1138 (m/z), which is consistent with GlcNAcMan 3 GlcNAc 2 (Fig. 19). To confirm 
the addition of the GlcNAc onto the Man 3 GlcNAc 2 structure, a 0-N- 

15 hexosanunidase digest was performed, which revealed a peak at 934 (m/z), 
consistent with the mass of Man 3 GlcNAc 2 (Fig. 20). 

[0143] The addition of the second GlcNAc onto GlcNAcMan 3 GlcNAc 2 is shown 
in Fig. 21. The peak at 1357 (m/z) corresponds to GlcNAc 2 Man 3 GlcNAc 2 . To 
infirm the addition of the two GlcNAcs onto the core mannose structure 

20 Man 3 GlcNAc 2 , another /^N-hexosaminidase digest was performed, which revealed 
a peak at 934 (m/z), consistent with the mass of Man 3 GlcNAc 2 (Fig. 22). This is 
conclusive data displaying a complex-type glycoprotein made in yeast cells. 
[0144] The in vitro addition of TJDP-galactose and p 1 ,4-galactosyltransferase 
onto the GlcNAc 2 Man 3 GlcNAc 2 resulted in a peak at 1664 (m/z), which is 

25 consistent wnh Ihe mass of Gal 2 GlcNAc 2 Man 3 GlcNAc 2 (Fig. 23) Finally, the in 
vitro addition of CMP-N-acetymeuraminic acid and sialyltransferase resulted in a 
peak at 2248 (m/z), which is consistent with the mass of 

NANA 2 Gal 2 GlcNAc 2 Man 3 GlcNAc 2 (Fig. 24). The above data supports the use of 
non-mammalian host cells, which are capable of producing complex human-like 
30 glycoproteins. 
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Targeting of glycosyl- and galactosyl-transferases to specific organelles. 
[0145] Much work has been dedicated to revealing the exact mechanismby 
which these enzymes are retained and anchored to their respective organelle. 
Although complex, evidence suggests that, stem region, membrane spanning 
5 region and cytoplasmic tail individually or in concert direct enzymes to the 

membrane of individual organelles and thereby localize the associated catalytic 
domain to that locus. 

[0146] The method by which active glycosyltransferases can be expressed and 
directed to the appropriate organelle such that a sequential order of reactions may 
10 occur, that leads to complex N-glycan formation, is as follows: 

(A) Establish a DNA library of regions that are known to encode proteins/peptides 
that mediate localization to a particular location in the secretory pathway (BR, 
Golgi and trans Golgi network). A limited selection of such enzymes and their 
respective location is shown in Table 1. These sequences maybe selected from 

1 5 the host to be engineered as well as other related or unrelated organism. Generally 
such sequences fall into three categories: (1) N-terminal sequences encoding a 
cytosolic tail (ct), a transmembrane domain (tmd) and part of a somewhat more 
ambiguously defined stem region (sr), which together or individually anchor 
proteins to the inner (lumenal) membrane of the Golgi, (2) retrieval signals which 

20 are generally found at the C-terminus such as the HDEL or KDEL tetrapeptide, 
and (3) membrane sp annin g nucleotide sugar transporters, which are known to 
locate in the Golgi. In the first case, where the localization region consists of 
various elements (ct, tmd and sr) the library is designed such that the ct, the tmd 
and various parts of the stem region are represented. This may be accomplished by . 

25 using PCR primers that bind to the 5' end of the DNA encoding the cytosolic 

region and employing a series of opposing primers that bind to various parts of the 
stem region. In addition one would create fusion protein constructs that encode 
sugar nucleotide transporters and known retrieval signals. 

(B) A second step involves the creation of a series of fusion protein constructs, 
30 that encode the above mentioned localization sequences and the catalytic domain 

of a particular glycosyltransferase cloned in frame to such localization sequence 
(e.g. GnT I, GalT, Fucosyltransferase or ST). In the case of a sugar nucleotide 
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transporter fused to a catalytic domain one may design such constructs such that 
the catalytic domain (e.g. GnT I) is either at the N- or the C-terminus of 1he 
resulting polypeptide. The catalytic domain, like the localization sequence, may be 
derived from various different sources. The choice of such a catalytic domains 

5 may be guided by the knowledge of the particular environment in which the 

catalytic domain is to be active. For example, if a particular glycosyltransferase is 
to be active in the late Golgi, and all known enzymes of the host organism in the 
late Golgi have a pH optimum of 7.0, or the late Golgi is known to have a 
particular pH, one would try to select a catalytic domain that has maximum activity 

10 at that pH. Existing in vivo data on the activity of such enzymes, in particular 
hosts, may also be of use. For example, Schwientek and coworkers showed that 
GalT activity can be engineered into the Golgi of S.cerevisiae and showed that 
such activity was present by demonstrating the transfer of some Gal to existing 
GlcNAc 2 in an alg mutant of S. cerevisiae. In addition, one may perform several 

15 rounds of gene shuffling or error prone PCR to obtain a larger diversity within the 
pool of fusion constructs, since it has been shown that single amino mutations may 
drastically alter the activity of glycoprotein processing enzymes (Romero et al., 
2000). Full length sequences of glycosyltransferases and their endogenous 
anchoring sequence may also be used. In a preferred embodiment, such 

20 localization/catalytic domain libraries are designed to incorporate existing 
information on the sequential nature of glycosylation reactions in higher 
eukaryotes. In other words, reactions known to occur early in the course of 
glycoprotein processing require the targeting of enzymes that catalyze such 
reactions to an early part of the Golgi or the ER For example, the trirnming of 

25 MangGlcNAcz to Man 5 GlcNAc 2 is an early step in complex N-glycan formation. 
Since protein processing is initiated in the ER and then proceeds through the early, 
medial and late Golgi, it is desirable to have this reaction occur in the ER or early 
Golgi. When designing a library for mannosidase I localization, one thus attempts 
to match ER and early Golgi targeting signals with the catalytic domain of 

30 mannosidase I. 

[0147] Upon transformation of the host strain with the fusion construct library a 
selection process is used to identify which particular combination of localization 
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sequence and catalytic domain in fact have the maximum effect on the 
carbohydrate structure found in such host strain. Such selection can he based on 
any number of assays or detection methods. They may be carried out manually or 
maybe automated through the use of high troughput screening equipment. 

5 [0148] In another example, GnT I activity is required for the maturation of 
complex N-glycans, because only after addition of GlcNAc to the terminal al,3 
mannose residue may further trimming of such a structure to the subsequent 
intermediate GlcNAcMan 3 GlcNAc 2 structure occur. Mannosidase H is most likely 
not capable of removing the terminal ocl,3- and al,6- mannose residues in the 

10 absence of a terminal p 1,2-GlcNAc and thus the formation of complex N-glycans 
will not proceed in the absence of GnT I activity (Schachter, 1991). Alternatively, 
one may first engineer or select a strain that makes sufficient quantities of 
MansGlcNAca as described in this invention by engineering or selecting a strain 
deficient in Alg3P activity. In the presence of sufficient UDP-GlcNAc transporter 

15 activity, as may be achieved by engineering or selecting a strain that has such 
UDP-GlcNAc transporter activity, GlcNAc can be added to the terminal a-1,3 
residue by GnTI as in vitro a Man 3 structure is recognized by by rat liver GnTI 
(Moller, 1992). 

[0149] In another approach, one may incorporate the expression of a UDP- 
20 GlcNAc transporter into the library mentioned above such that the desired 
construct will contain: (1) a region by which the transformed construct is 
maintained in the cell (e.g. origin of replication or a region that mediates 
chromosomal integration), (2) a marker gene that allows for the selection of cells 
that have been transformed, including counterselectable and recyclable markers 
25 such as ura3 or T-urfl3 (Soderholm, 2001) or other well characterized selection- 
markers (e.g his4, bla, Sh ble etc.), (3) a gene encoding a UDP-GlcNAc 
transporter (e.g. from K.lactis, (Abeijon, 1996), or from Ksapiens (Ishida, 1996), 
and (4) a promotor activating the expression of the above mentioned 
localization/catalytic domain fusion construct library. 
30 [0150] After transformation of the host with the library of fusion constructs 

described above, one may screen for those cells that have the highest concentration 
of terrainal GlcNAc on the cell surface, or secrete the protein with the highest 
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terminal GlcNAc content Such a screen may be based on a visual method, like a 
staining procedure, the ability to bind specific terminal GlcNAc binding antibodies 
or lectins conjugated to a marker (such lectins are available from E.Y. Laboratories 
Inc., San Mateo, CA), the reduced ability of specific lectins to bind to tenninal 

5 mannose residues, the ability to incorporate a radioactively labeled sugar in vitro, 
altered binding to dyes or charged surfaces, or may be accomplished by using a 
Fluorescence Assisted Cell Sorting (FACS) device in conjunction with a 
fluorophore labeled lectin or antibody (Guillen, 1998). It may be advantageous to 
enrich particular phenotypes within the transformed population with cytotoxic 

10 lectins. U.S. Patent No. 5,595,900 teaches several methods by which cells with a 
desired extra-cellular carbohydrate structures may be identified. Repeatedly 
carrying out this strategy allows for the sequential engineering of more and more 
complex glycans in lower eukaryotes. 

[0151] After transformation, one may select for transformants that allow for the 

15 most efficient transfer of GlcNAc by GlcNAc Transferase E fromUDP-GlcNAc in 
an in vitro assay. This screen may be carried out by growing cells harboring the 
transformed library under selective pressure on an agar plate and transferring 
individual colonies into a 96-well microliter plate. After growing the cells, the 
cells are centrifuged, the cells resuspended in buffer, and after addition of UDP- 

20 GlcNAc and GnT V, the release of UDP is determined either by HPLC or an 

enzyme linked assay for UDP. Alternatively, one may use radioactively labeled 
UDP-GlcNAc and GnT V, wash the cells and then look for the release of 
radioactive GlcNAc by N-actylglucosaniimdase. All this may be carried manually 
or automated through the use of high throughput screening equipment. 

25 [0152] Transformants that release more UDP, in the first assay, or more 

radioactively labeled GlcNAc in the second assay, are expected to have a higher 
degree of GlcNAcMan 3 GlcNAc 2 (Fig. 3) on their surface and thus constitute the 
desired phenotype. Alternatively, one may any use any other suitable screen such 
as a lectin binding assay that is able to reveal altered glycosylation patterns on the 

30 surface of transformed cells. In this case the reduced binding of lectins specific to 
terminal mannoses may be a suitable selection tool. Galantus nivalis lectin binds 
specifically to terminal ct-1,3 mannose, which is expected to be reduced if 
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sufficient mannosedase E activity is present in the Golgi. One may also enrich for 
desired transformants by carrying out a chromatographic separation step that 
allows for the removal of cells containing a high terminal mannose content This 
separation step would be carried out with a lectin column that specifically binds 

5 cells with a high terminal mannose content (e.g Galantus nivalis lectin bound to 
agarose , Sigma, StLouis, MO) over those that have a low terminal mannose 
content. In addition, one may directly create such fusion protein constructs, as 
additional information on the localization of active carbohydrate modifying 
enzymes in different lower eukaryotic hosts becomes available in the scientific 

10 hterature. For example, the prior art teaches us that human pi,4-GalTr can be 

fused to the membrane domain of MNT, a mannosyltransferase from S. cerevisiae, 
and localized to the Golgi apparatus while retaining its catalytic activity 
(Schwientek et al., 1995). If S. cerevisiae or a related organism is the host to be 
engineered one may directly incorporate such findings into the overall strategy to 

15 obtain complex N-glycans from such a host. Several such gene fragments in 
P.pastoris have been identified that are related to glycosyltransferases in 
S.cerevisiae and thus could be used for that purpose. 
Table 1 



Gene or 
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Function 


Location of eene 
product 


seauence 






MnsI 


S.cerevisiae 


mannosidase 


ER 


Ochl 


S.cerevisiae 


1 ,6-mannosyltransferase 


Golgi (cis) 


Mnn2 


S.cerevisiae 


1 ,2-mannosyltransferase 


Golgi (medial) 


Mnnl 


S.cerevisiae 


1 ,3-mannosyltransferase 


Golgi (trans) 


Ochl 


P.pastoris 


1 ,6-mannosyltransferase 


Golgi (cis) 


2,6 ST 


H.sapiens 
S.frugiperda 


2,6-sialyltransferase 


trans-Golgi network 


01,4 Gal T 


bovine milk 


UDP-Gal transporter 


Golgi 


Mntl 


S.cerevisiae 


1 ,2-mannosyltransferase 


Golgi (cis) 


HDELatC- 
tenninus 


S.cerevisiae 


retrieval signal 


ER 
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Integration Sites 

[0153] As one ultimate goal of this genetic engineering effort is a robust protein 
production strain that is able to perform well in an industrial fermentation process, 
the integration of multiple genes into the host (e.g., fungal) chromosome involves 

5 careful planning. The engineered strain will most likely have to be transformed 
with a range of different genes, and these genes will have to be transformed in a 
stable fashion to ensure that the desired activity is maintained throughout the 
fermentation process. Any combination of the following enzyme activities will 
have to be engineered into the fungal protein expression host: sialyltransferases, 

10 mannosidases, fucosyltransferases, galactosyltransferases, glucosyltransferases, 
GlcNAc transferases, ER and Golgi specific transporters (e.g. syn and antiport 
transporters for TJDP-galactose and other precursors), other enzymes involved in 
the processing of oHgosaccharides, and enzymes involved in the synthesis of 
activated oligosaccharide precursors such as TJDP-galactose, CMP-N- 

1 5 acetymeurarninic acid. At the same time, a number of genes which encode 

enzymes known to be characteristic of non-human glycosylation reactions, will 
have to be deleted. Such genes and their corresponding proteins have been 
extensively characterized in a number of lower eukaryotes (e.g. S.cerevisiae, 
T.reesei, A. nidulans etc.), thereby providing a list of known glycosyltransferases 

20 in lower eukaryotes, their activities and their respective genetic sequence. These 
genes are likely to be selected from the group of mannosyltransferases e.g. 1,3 
mannosyitransferases (e.g. MNN1 in S.cerevisiae) (Gr aham , 1991), 1,2 
mannosyltransferases (e.g. KTR/KRE family from Scerevisiae), 1,6 
mannosyltransferases (OCH1 from S.cerevisiae), mannosylphosphate transferases 

25 (MNN4 and MNN6 from £ cerevisiae) and additional enzymes that are involved in 
aberrant i.e. non human glycosylation reactions. Many of these genes have in fact 
been deleted individually giving rise to viable phenotypes with altered 
glycosylation profiles. Examples are shown in Table 2: 
Table 2. 



Strain 


Mutant 


Structure wild 
type 


Structure 
mutant 


Authors 


Schizosaccharomyces 
pombe 


OCH1 


Mannan (i.e. 
Man^GlcNAc^ 


MangGlcNAcz 


Yoko-oetal., 2001 
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S.cerevisiae 


OCH1, 
MNNl 


Marman(i.e. ! 
Man^GlcNAc^ 


MaBgGlcNAcj 


NakamsM-SiiiDdo 
etal,. 1993 


S.cerevisiae 


OCH1, 
MNNl 
MNN4 


Mannan(i.e. 
Man^GbNAcz) 


MansGlcNAc 2 


Oriba et al., 1998 



As any strategy to engineer the formation of complex N-glycans into a lower 
eukaryote involves both the elimination as well as the addition of 
glycosyltransferase activities, a comprehensive scheme will attempt to coordinate 
5 both requirements. Genes mat encode enzymes mat are undesirable serve as 
potential integration sites for genes that are desirable. For example, 1,6 
mannosyltransferase activity is a hallmark of glycosylate in many known lower 
eukaryotes. The gene encoding alpha-1,6 mannosyltransferase (OCH1) has been 
cloned from S.cerevisiae and mutations in the gene give raise to a viable phenotype 

10 with reduced mannosylation. The gene locus encoding alpha-1,6 

mannosyltransferase activity therefor is a prime target for the integration of genes 
encoding glycosyltransferase activity. In a similar manner, one can choose a range 
of other chromosomal integration sites that, based on a gene disruption event in 
that locus, are expected to: (1) improve the cells ability to glycosylate in a more 

15 human like fashion, (2) improve the cells ability to secrete proteins, (3) reduce 

proteolysis of foreign proteins and (4) improve other characteristics of the process 
that facilitate purification or the fermentation process itself. 
Providing sugar nucleotide precursors 

[0154] A hallmark of higher eukaryotic glycosylation is the presence of 
20 galactose, fucose, and a high degree of terminal sialic acid on glycoproteins. 
These sugars are not generally found on glycoproteins produced in yeast and 
filamentous fungi and the method discussed above allows for the engineering of 
strains that localize glycosyltransferase in the desired organelle. Formation of 
complex N-glycan synthesis is a sequential process by which specific sugar 
25 residues are removed and attached to the core oligosaccharide structure. In higher 
eukaryotes, this is achieved by having the substrate sequentially exposed to various 
processing enzymes. These enzymes carry out specific reactions depending on 
their particular location within the entire processing cascade. This "assembly line" 
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consists of ER, early, medial and late Golgi, and the trans Golgi network all with 
their specific processing environment. To recreate the processing of human 
glycoproteins in the Golgi and ER of lower eukaryotes, numerous enzymes (e.g. 
glycosyltransferases, glycosidases, phosphatases and transporters) have to be 

5 expressed and specifically targeted to these organelles, and preferably, in a location 
so that they function most efficiently in relation to their environment as well as to 
other enzymes in the pathway. [0155] Several individual glycosyltransferases 
have been cloned and expressed in S.cerevisiae (GalT, GnT I), Aspergillus 
nidulans (GnT I) and other fungi, without however demonstrating the desired 

10 outcome of "humanization" on the glycosylation pattern of the organisms 

(Yoshida, 1995; Schwientek, 1995; Kalsner, 1995). It was speculated that the 
carbohydrate structure required to accept sugars by the action of such 
glycosyltransferases was not present in sufficient amounts. While this most likely 
contributed to the lack of complex N-glycan formation, there are currently no 

15 reports of a fungus supplying a Man 5 GlcNAc 2 structure, having GnT I activity and 
having UDP-Gn transporter activity engineered into the fungus. It is the 
combination of these three biochemical events that are required for hybrid and 
complex N-glycan formation. 

[0156] In humans, the full range of nucleotide sugar precursors (e.g. UDP-N- 
20 acetylglucosamine, UDP-N-acetylgalactosamine, CMP-N-acetylneuraminic acid, 
UDP-galactose, etc.) are generally synthesized in the cytosol and transported into 
the Golgi, where they are attached to the core oligosaccharide by 
glycosyltransferases. To replicate this process in lower eukaryotes, sugar 
nucleoside specific transporters have to be expressed in the Golgi to ensure 
25 adequate levels of nucleoside sugar precursors (Sommers, 1981 ; Sommers, 1982; 
Perez, 1987). A side product of this reaction is either a nucleoside diphosphate or 
monophosphate. While monophosphates can be directly exported in exchange for 
nucleoside triphosphate sugars by an antiport mechanism, diphospho nucleosides 
(e.g. GDP) have to be cleaved by phosphatases (e.g. GDPase) to yield nucleoside 
30 monophosphates and inorganic phosphate prior to being exported. This reaction 
appears to be important for efficient glycosylation, as GDPase from S.cerevisiae 
has been found to be necessary for mannosylation. However, the enzyme only has 
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10% of the activity towards UDP (Beminsone, 1994). Lower eukaryotes often do 
not have UDP specific diphosphatase activity in the Golgi since they do not utilize 
UDP-sugar precursors for glycoprotein synthesis in the Golgi. 
[0157] Schizosaccharomyces pornbe, a yeast found to add galactose residues to 
5 cell wall polysaccharides (from UDP-galactose) was found to have specific 
UDPase activity further suggesting the requirement for such an enzyme 
(Beminsone et aL, 1994). UDP is known to be a potent inhibitor of 
glycosyltransferases and the removal of this glycosylation side product is 
important in order to prevent glycosyltransferase inhibition in the lumen of the 

10 Golgi (Khatara et aL, 1974). Thus, one may need to provide for the removal of 
UDP, which is expected to accumulate in the Golgi of such an engineered strains 
(Beminsone, 1995; Beaudet, 1998). [0158] In another example, 2,3 
sialyltransferase and 2,6 sialyltransferase cap galactose residues with sialic acid in 
the trans-Golgi and TGN of humans leading to a mature form of the glycoprotein. 

15 To reengineer this processing step into a metabolically engineered yeast or fungus 
will require (1) 2,3-sialyltransferase activity and (2) a sufficient supply of CMP-N- 
acetyl neur amini c acid, in the late Golgi of yeast To obtain sufficient 2,3- 
sialyltransferase activity in the late Golgi, the catalytic domain of a known 
sialyltransferase (e.g. from humans) has to be directed to the late Golgi in fungi 

20 (see above). Likewise, transporters have to be engineered to that allow the 

transport of CMP-N-acetyl neuraminic acid into the late Golgi. There is currently 
no indication that fungi synthesize sufficient amounts of CMP-N-acetyl neuraminic 
acid, not to mention the transport of such a sugar-nucleotide into the Golgi. 
Consequently, to ensure the adequate supply of substrate for the corresponding 

25 glycosyltransferases, one has to metabolically engineer the production of CMP- 
sialic acid into the fungus. 

Methods for providing sugar nucleotide precursors to the Golgi apparatus: 

UDP-N-acetyl-glucosamine 
30 [0159] The cDNA of human UDP-N-acetylglucosamine transporter, which was 
recognized through a homology search in the expressed sequence tags database 
(dbEST) was cloned by Ishida and coworkers (Ishida, 1999). Guillen and 
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coworkers have cloned the mammalian Golgi membrane transporter for UDP-N- 
. acetylglucosamine by phenotypic correction with cDNA from canine kidney cells 
(MDCK) of a recently characterized Kluyveromyces lactis mutant deficient in 
Golgi transport of the above nucleotide sugar (Guillen, 1998). Their results 

5 demonstrate that the mammalian Golgi UDP-GlcNAc transporter gene has all of 
the necessary information for the protein to be expressed and targeted functionally 
to the Golgi apparatus of yeast and that two proteins with very different amino acid 
sequences may transport the same solute within the same Golgi membrane 
(Guillen, 1998). 

10 GDP-Fucose 

[0160] The rat liver Golgi membrane GDP-fucose transporter has been identified 
and purified by Puglielli, L. and C. B. Hirschberg (Puglielli, 1999). The 
corresponding gene has not been identified however N-terrninal sequencing can be 
used for the design of oligonucleotide probes specific for the corresponding gene. 
15 These oligonucleotides can be used as probes to clone the gene encoding for GDP- 
fucose transporter. 
UDP-Galactose 

[0161] Two heterologous genes, gmal2(+) encoding alpha 1,2- 
galactosyltransferase (alpha 1,2 GalT) from Schizosaccharomyces pombe and 
20 (hUGT2) encoding human UDP-galactose (UDP-Gal) transporter, have been 
functionally expressed in S.cerevisiae to examine the intracellular conditions 
required for galactosylation. Correlation between protein galactosylation and 
UDP-galactose transport activity indicated that an exogenous supply of UDP-Gal 
transporter, rather than alpha 1 ,2 GalT played a key role for efficient 
25 galactosylation in S.cerevisiae (Kainuma, 1999). Likewise a UDP-galactose 
transporter from S. pombe was cloned (Aoki, 1999; Segawa, 1999). 

CMP-N-acetylnewaminic acid (CMPSialic acid) 
[0162] Human CMP-sialic acid transporter (hCST) has been cloned and 
expressed in Lec 8 CHO cells (Aoki, 1999; Eckhardt, 1997). The functional 
30 expression of the murine CMP-sialic acid transporter was achieved in 

Saccharomyces cerevisiae (Berninsone, 1997). Sialic acid has been found in some 
fungi, however it is not clear whether the chosen host system will be able to supply 
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sufficient levels of CMP-Sialic acid Sialic acid can be either supplied in the 
medium or alternatively fungal pathways involved in sialic acid synthesis can 
be integrated into the host genome. 



5 Diphosphatases 

[0163] When sugars are transferred onto a glycoprotein, either a nucleoside 
diphosphate or monophosphate, is released from the sugar nucleotide precursors. 
. While monophosphates can be directly exported in exchange for nucleoside 
triphosphate sugars by an antiport mechanism, diphospho nucleosides (e.g. GDP) 

10 have to be cleaved by phosphatases (e.g. GDPase) to yield nucleoside 

monophosphates and inorganic phosphate prior to being exported. This reaction 
appears to be important for efficient glycosylation, as GDPase from S.cerevisiae 
has been found to be necessary for mannosylation. However, the enzyme only has 
10% of the activity towards TJDP (Berninsone, 1994). Lower eukayotes often do 

15 not have UDP specific diphosphatase activity in the Golgi since they do not utilize 
UDP-sugar precursors for glycoprotein synthesis in the Golgi. 
Scluzosaccharomyces pombe, a yeast found to add galactose residues to cell wall 
polysaccharides (from UDP-galactose) was found to have specific UDPase activity 
further suggesting the requirement for such an enzyme (Berninsone, 1994). UDP 

20 is known to be a potent inhibitor of glycosyltransferases and the removal of this 
glycosylation side product is important in order to prevent glycosyltransferase 
inhibition in the lumen of the Golgi (Khatara et al. 1974). 



25 



Expression Of GnTs To Produce Complex N-glycans 



Br prassion Of GnT-TTI To Boost Antibody Functionality 
[0164] The addition of mN-acetylglucosamine to the GlcNAciMan 3 GlcNAc2 
structure by N-acerylglucosaminyltransferases II and HI yields a so-called bisected 
N-glycan GlcNAc3Man 3 GlcNAc 2 (Fig. 3). This structure has been implicated in 
30 greater antibody-dependent cellular cytotoxicity (ADCC) (Umana et al. 1999). Re- 
engineering glycoforms of hmnunoglobulins expressed by mammalian cells is a 
tedious and cumbersome task. Especially in the case of GnTm, where over- 
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expression of this enzyme has been implicated in growth inhibition, methods 
involving regulated (inducible) gene expression had to be employed to produce 
immunoglobulins with bisected N-glycans (Umana et al 1999a, 1999b). 
[0165] Accordingly, in another embodiment, the invention provides systems and 
methods for producing human-like N-glycans having bisecting N- 
acetylglucosamine (GlcNAcs) on the core mannose structure, m a preferred 
embodiment, the invention provides a system and method for producing 
immunoglobulins having bisected N-glycans. The systems and methods described 
herein will not suffer from previous problems, e.g., cytotoxicity associated with 
overexpression of GnTHI or ADCC, as the host cells of the invention are 
engineered and selected to be viable and preferably robust cells which produce N- 
glycans having substantially modified human-type glycoforms such as 
GlcNAc 2 Man 3 GlcNAc2. Thus, addition of a bisecting N-acetylglucosamine in a 
host cell of the invention will have a negligible effect on the growth-phenotype or 
15 viability of those host cells. 

[0166] In addition, previous work (Umana) has shown that there is no linear 
correlation between GnTHI expression levels and the degree of ADCC. Finding 
the optimal expression level in mammalian cells and m aintain in g it throughout an 
FDA approved fermentation process seems to be a challenge. However, in cells of 
20 the invention, such as fungal cells, fmding a promoter of appropriate strength to 
establish a robust, reliable and optimal GnTHI expression level is a comparatively 
easy task for one of skill in the art. 

[0167] A host cell such as a yeast strain capable of producing glycoproteins with 
bisecting N-glycans is engineered according to the invention, by introducing into 

25 the host cell a GnTm activity (Example 6). Preferably, the host cell is 

transformed with a nucleic acid that encodes GnTDI (see, e.g., Fig. 32) or a 
domain thereof having enzymatic activity, optionally fused to a heterologous cell 
signal targeting peptide (e.g., using the libraries and associated methods of the 
invention) Host cells engineereded to express GnTDI will produce higher 

30 antibody titers than mammalian cells are capable of They will also produce 
antibodies with higher potency with respect to ADCC. 
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[0168] Antibodies produced by mammalian cell lines transfected with GnTDI 
have been shown to be as effective as antibodies produced by non-transfected cell- 
lines, but at a 10-20 fold lower concentration (Davies et al. 2001). An increase of 
productivity of the production vehicle of the invention over mammalian systems by 
5 a factor of twenty, and a ten-fold increase of potency will result in a net- 
productivity improvement of two hundred. The invention thus provides a system 
and method for producing high titers of an antibody having high potency (e.g., up 
to several orders of magnitude more potent man what can currently be produced). 
The system and method is safe and provides high potency antibodies at low cost in 
10 short periods of time. Host cells engineered to express GnT HI according to the 
invention produce immunoglobulins having bisected N-glycans at rates of at least 
50 mg/liter/day to at least 500 mg/liter/day. In addition, each immunoglobulin (Ig) 
molecule (comprising bisecting GlcNAcs) is more potent than the same Ig 
molecule produced without bisecting GlcNAcs. 

15 

nirminp ; and expression of Gd T-IV and GnT-V 

[0169] All branching structures in complex N-glycans are synthesized on a 
common core-pentasaccharide (Man 3 GlcNAc 2 or Man alphal-6(Man alphal- 
3)Man betal-4 GlcNAc betal-4 GlcNAc betal-4 or Man 3 GlcNAc 2 ) by N- 

20 acerylglucosamine transferases (GnTs) -I to -VI (Schachter H et al. (1989) 

Methods ETtzymo\\19:'i5\-97). Current understanding of the biosynthesis of more 
highly branched N-glycans suggests mat after the action of GnTII (generation of 
GlcNAc 2 Man 3 GlcNAc2 structures) GnTTV transfers GlcNAc from UDP-GlcNAc 
in betal,4 linkage to the Man alphal,3 Man betal,4 arm of GlcNAc 2 Man 3 GlcNAc 2 

25 N-glycans (Allen SD et al. (1984) J Biol Chem. Jun 10;259(1 1):6984-90; and 

Gleeson PA and Schachter H.J (1983); J.Biol Chem 25;258(10):6162-73) resulting 
in a triantennary agalacto sugar chain. This N-glycan (GlcNAc betal-2 Man 
alphal-6(GlcNAc betal-2 Man alphal-3) Man betal-4 GlcNAc beta 1-4 GlcNAc 
betal,4 Asn) is a common substrate for GnT-IH and -V, leading to the synthesis 

30 of bisected, tri-and tetra-antennary structures. Where the action of GnTDI results 
in a bisected N-glycan and where GnTV catalyzes the addition of beta l-6GlcNAc 
to the alpha 1-6 mannosyl core, creating the beta 1-6 branch Addition of galactose 
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and sialic acid to these branches leads to the generation of a fully sialylated 
complex N-glycan. 

[0170] Branched complex N-glycans have been implicated in the physiological 
activity of therapeutic proteins, such as human erythropoietin (hEPO). Human 
5 EPO having bi-antennary structures has been shown to have a low activity, 

whereas hEPO having tetra-antennary structures resulted in slower clearance from 
the bloodstream and thus in higher activity (Misaizu T et al. (1 995) Blood Dec 
1;86(11):4097-104). 

[0171] With DNA sequence information, file skilled worker can clone DNA 
10 molecules encoding GnT IV and/or V activities (Example 6; Figs. 33 and 34). 
Using standard techniques well-known to those of skill in the art, nucleic acid 
molecules encoding GnT IV or V (or encoding catalytically active fragments 
thereof) may be inserted into appropriate expression vectors under the 
transcriptional control of promoters and other expression control sequences 
15 capable of driving transcription in a selected host cell of the invention, e.g., a 

fungal host such as Pichia sp., Kluyveromyces sp. and Aspergillus sp., as described 
herein, such that one or more of these mammalian GnT enzymes may be actively 
expressed in a host cell of choice for production of a human-like complex 
glycoprotein. 

20 

[0172] The following are examples which illustrate the compositions and 
methods of this invention These examples should not be construed as limiting: 
the examples are included for the purposes of illustration only. 

25 EXAMPLE 1 

Identification, cloning and deletion of the ALG3 gene in P.pastoris and Klactis. 
[01 73] Degenerate primers were generated based on an alignment of Alg3 
protein sequences from S. cerevisiae, H. sapiens, andD. melanogaster and were 
used to amplify an 83 bp product from P. pastoris genomic DNA: 

30 S'-GGTGTTTTGTTTTCTAGATCTTTGCAYTAYCARTT-S' and 

5 ' - AGAATTTGGTGGGT AAGAATTCC ARC ACC AYTCRTG-3 ' The resulting 
PCR product was cloned into the pCR2.1 vector (Invitrogen, Carlsbad, CA) and 
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seqence analysis revealed homology to known ALG3/RHK1/NOT56 homologs 
(Genbank NC001 134.2, AF309689, NC_003424.1). Subsequently, 1929 bp 
upstream and 2738 bp downstream of the initial PCR product were amplified from 
a P. pastoris genomic DNA library (Boehm, T. Yeast 1999 May,15(7):563-72) 
5 using the internal oligonucleotides 

5'- CCTAAGCTGGTATGCGTTCTCTTTGCCATATC-3' and 

5 '-GCGGCATAAAC AATAATAGATGCTATAAAG-3 ' along withT3 

(5 * - AATTAACCCTCACTAAAGGG-3 ') and T7 (5'-GTAA 

TACGACTCACTATAGGGC-3 ') (Integrated DNA Technologies, Coralville, IA) 

10 in the backbone of the library bearing plasmid lambda ZAP II (Stratagene, La 
Jolla, CA). The resulting fragments were cloned into the pCR2.1-TOPO vector 
(Thvitrogen) and sequenced. From this sequence, a 1395 bp ORF was identified 
mat encodes a protein with 35% identity and 53% similarity to the S. cerevisiae 
ALG3 gene (using BLAST programs). The gene was named PpALG3. 

15 [0174] The sequence of PpALG3was used to create a set of primers to generate a 
deletion construct of the PpALG3 gene by PCR overlap (Davidson et al, 2002 
Microbiol. 148(Pt 8):2607-15). Primers below were used to amplify 1 kb regions 
5' and 3' of the PpALG3 ORF and the KAN R gene, respectively: 
RCD142 (5 ' -CC AC ATC ATCCGTGCTAC ATAT AG-3 '), 

20 RCD144 (5 ' -ACGAGGCAAGCTAAACAGATCTCGAAGTATCGAGGGTT AT 

CCAG-3'), 

RCD145 (5'-CCATCCAGTGTCGAAAACGAGCCAATGGTTCATGTCTATA 
AATC-3'), 

RCD147(5'-AGCCTCAGCGCCAACAAGCGATGG-3'), 
25 RCD143 (5'-CTGGATAACCCTCGATACTTCGAGATCTGTTTAGCTTGCC 

TCGT-3'), and 

RCD146 (5'-GATTTATAGACATGAACCATTGGCTCGTTTTCGACACTGG 
ATGG-3')- 

Subsequently, primers RCD142 and RCD147 were used to overlap the three 
30 resulting PCR products into a single 3.6 kb dlg3::KAlf- deletion allele. 

Identification, cloning and deletion of the ALG3 gene in Klactis. 
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[0175] The ALG3p sequences from S. cerevisiae, Drosophila melanogaster, 
Homo sapiens etc were aligned with K. lactis sequences (PENDANT EST 
database). Regions of high homology that were in common homologs but distinct 
in exact sequence from the homologs were used to create pairs of degenerate 

5 primers that were directed against genomic DNA from the K. lactis strain MG1/2 
(Bianchi et al, 1987). In the case of ALG3, PCR amplification with primers KAL-1 
(5 '-ATCCTTTACCGATGCTGTAT-3 ' ) andKAL-2 (5'- 
ATAACAGTATGTGTTAC ACGCGTGTAG-3 ') resulted in a product that was 
cloned and sequenced and the predicted translation was shown to have a high 

10 degree of homology to Alg3p proteins (>50% to S. cerevisiae Alg3p). 

[0176] The PCR product was used to probe a Southern blot of genomic DNA 
from K. lactis strain (MG1/2) with high stringency (Sambrook et al, 1989). 
Hybridization was observed in a pattern consistent with a single gene. This 
Southern blot was used to map the genomic loci. Genomic fragments were cloned 

1 5 by digesting genomic DNA and ligating those fragments in the appropriate size- 
range into pUC19 to create a K. lactis subgenomic library. This subgenomic 
library was transformed into E. coli and several hundred clones were tested by 
colony PCR using primers KAL-1 and KAL-2. The clones containing the 
predicted KIALG3 andKlALG61 genes were sequenced and open reading frames 

20 identified. 

[0177] Primers for construction of an alg3::NAl* deletion allele, using a PCR 
overlap method (Davidson et al, 2002), were designed and the resulting deletion 
allele was transformed into two K. lactis strains and NAT-resistant colonies 
selected These colonies were screened by PCR and transformants were obtained 
25 in which the ALG3 ORF was replaced with the ochl::NA7* mutant allele. 

EXAMPLE 2 

Generation of an alg3/ochl mutant strain expressing an a-l,2-Mannosidase, 
GnTl and GnTJI for production of a human-like glycoprotein. 

[0178] The 12 1 5 bp open reading frame of the P. pastoris OCH1 gene as well as 
30 2685 bp upstream and 1 175 bp downstream was amplified by PCR (B. K. Choi et 
al., submitted to Proa Natl. Acad. Sci. USA 2002; see also WO 02/00879; each of 
which is incorporated herein by reference), cloned into the pCR2.1-TOPO vector 
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(Invitrogen) and designated pBK9. To create an ochl knockout strain containing 
multiple auxotrophic markers, 100 fig of pJN329, a plasmid containing an 
ochl:: URA3 mutant allele flanked with SfiL restriction sites was digested with Sfil 
and used to transform P. pastoris strain JC308 (Cereghino et al. Gene 263 (2001) 
5 159-169) by electroporation. Following incubation on defined medium lacking 
uracil for 10 days at room temperature, 1000 colonies were picked and re-streaked. 
URA + clones that were unable to grow at 37°C, but grew at room temperature, 
were subjected to colony PCR to test for the correct integration of the ochl:: URA3 
mutant allele. One clone that exhibited the expected PCR pattern was designated 

10 YJN153. The Kringle 3 domain of human plasminogen (K3) was used as a model 
protein. A Neo R marked plasmid containing the K3 gene was transformed into 
strain YJN153 and a resulting strain, expressing K3, was named BK64-1 (B. K. 
Choi et al, submitted to Proa Natl Acad. Set USA 2002). 
[0179] Plasmid pPB 103, containing the KJuyveromyces lactis MNN2-2 gene, 

15 encoding a Golgi UDP-N-acetylglucosamine transporter was constructed by 

cloning a blunt BgUlrHindOl fragment from vector pDL02 (Abeijon et al. (1996) 
Proa Natl Acad Sci. U.S.A. 93:5963-5968) into BglR and^amffl digested and 
blunt ended pBLADE-SX containing the P. pastoris ADE1 gene (Cereghino et al. 
(2001) Gene 263:159-169). This plasmid was linearized with EcoNl and 

20 transformed into strain BK64-1 by electroporation and one strain confirmed to 
contain the MNN2-2 by PCR analysis was named PBP1. 

[01 80] A library of mannosidase constructs was generated, comprising in-frame 
fusions of the leader domains of several type I or type II membrane proteins from 
S. cerevisiae and P. pastoris fused with the catalytic domains of several a-1,2- 

25 mannosidase genes from human, mouse, fly, worm and yeast sources (see, e.g., 
WO02/00879, incorporated herein by reference). This library was created in a P. 
pastoris HIS4 integration vector and screened by linearizing with Sail, 
transforming by electroporation into strain PBP1, and analyzing the glycans 
released from the K3 reporter protein. One active construct chosen was a chimera 

30 of the 988-1296 nucleotides (C-terminus) of the yeast SBC12 gene fused with a N- 
terminal deletion of fee mouse a-l,2-mannosidase IA (MmMannIA) gene, which 
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was missing the 1 87 nucleotides. A P. pastoris strain expressing this construct was 
named PBP2. 

[0181] A library of GnTI constructs was generated, comprising in-frame fusions 
of the same leader library with the catalytic domains of GnTI genes from human, 

5 worm, frog and fly sources (WO 02/00879). This library was created in a P. 
pastoris ARG4 integration vector and screened by linearizing with AatTl, 
transforming by electroporation into strain PBP2, and analyzing the glycans 
released from K3. One active construct chosen was a chimera of the first 120 bp of 
the S. cerevisiae MNN9 gene fused to a deletion of the human GnTI gene, which 

10 was missing the first 154 bp. A P. pastoris strain expressing this construct was 
named PBP3. 

[0182] Subsequently, a P. pastoris alg3::KA]f deletion construct was generated 
as described above. Approximately 5ug of the resulting PCR product was 
transformed into strain PBP3 and colonies were selected on YPD medium 

15 containing 200ug/ml G418. One strain out of 20 screened by PCR was confirmed 
to contain the correct integration of the alg3::KAlf mutant allele and lack the 
wild-type allele. This strain was named RDP27. 
[01 83] Finally, a library of GnTEE constructs was generated, which was 
comprised of in-frame fusions of the leader library with the catalytic domains of 

20 GnTH genes from human and rat sources (WO 02/00879). This library was 

created in a P. pastoris integration vector containing the NST* gene conferring 
resistance to the drug nourseotbricin. The library plasmids were linearized with 
EcdRI, transformed into strain RDP27 by electroporation, and the resulting strains 
were screened by analysis of the released glycans from purified K3. 



25 



Materials 

[01 84] MOPS, sodium cacodylate, manganese chloride, UDP-galactose and 
CMP-N-acetylneuraminic acid were from Sigma. TFA was from Aldrich. 
Recombinant rat cc2,6-sialyltransferase from Spodopterafrugiperda and pl,4- 
30 galactosyltransferase from bovine milk were from Calbiochem. Protein N- 

glycosidase F, mannosidases, and oligosaccharides were from Glyko (San Rafael, 
CA). DEAE ToyoPearl resin was from TosoHaas. Metal chelating "HisBind" 
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resin was from Novagen (Madison, WT). 96-well lysate-clearing plates were from 
Promega (Madison, WI). Protein-binding 96-weU plates were from Millipore 
(Bedford, MA). Salts and buffering agents were from Sigma (St Louis, MO). 
MALDI matrices were from Aldrich (Milwaukee, WT). 

5 

Protein Purification 

[0185] Kringle 3 was purified using a 96-well format on a Beckman BioMek 
2000 sample-handling robot (Beckman/Coulter Ranch Cucamonga, CA). Kringle 
3 was purified from expression media using a C-terminal hexa-histidine tag. The 

10 robotic purification is an adaptation of the protocol provided by Novagen for their 
HisBind resin. Briefly, a 150uL QiL) settled volume of resin is poured into the 
wells of a 96-well lysate-binding plate, washed with 3 volumes of water and 
charged with 5 volumes of 50mM MS04 and washed with 3 volumes of binding 
buffer (5mM imidazole, 0.5M NaCl, 20mM Tris-HCL pH7.9). The protein 

15 expression media is diluted 3:2, media/PBS (60mM P04, 16mM KC1, 822mM 
NaCl pH7.4) and loaded onto the columns. After draining, the columns are 
washed with 10 volumes of binding buffer and 6 volumes of wash buffer (30mM 
imidazole, 0.5M NaCl, 20mM Tris-HCl pH7.9) and the protein is eluted with 6 
volumes of elution buffer (1M imidazole, 0.5M NaCl, 20mM Tris-HCl pH7.9). 

20 The eluted glycoproteins are evaporated to dryness by lyophilyzation. 

Release of N-linked Glycans 

[0186] The glycans are released and separated from lie glycoproteins by a 
modification of a previously reported method (Papac, et al. A. J. S. (1998) 

25 Glycobiology 8, 445-454). The wells of a 96-well MultiScreen IP (Immobilon-P 
membrane) plate (Millipore) are wetted with lOOuL of methanol, washed with 
3X150uL of water and 50uL of RCM buffer (8M urea, 360mM Tris, 3.2mM 
EDTA pH8.6), draining with gentle vacuum after each addition. The dried protein 
samples are dissolved in 30uL of RCM buffer and transferred to the wells 

30 con tainin g lOuL of RCM buffer. The wells are drained and washed twice with 

RCM buffer. The proteins are reduced by addition of 60uL of 0. 1M DTT in RCM 
buffer for lhr at 37oC. The wells are washed three times with 300uL of water and 
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carboxymethylatedby addition of 60uL of 0.1M iodoacetic acid for 30min in the 
dark at room temperature. The wells are again washed three times with water and 
me membranes blocked by tbe addition of lOOuL of 1% PVP 360 in water for lhr 
at room temperature. The wells are drained and washed three times with 300uL of 
5 water and deglycosylated by the addition of 30uL of lOmM NH4HC03 pH 8.3 
containing one miUiunit of N-glycanase (Glyko). After 16 hours at 37oC, the 
solution containing the glycans was removed by centrifugation and evaporated to 
dryness. 

10 Matrix Assisted Laser Desorption Ionization Time of Plight Mass 
Spectrometry 

[0187] Molecular weights of the glycans were detennined using a Voyager DE 
PRO linear MALDI-TOF (Applied Biosciences) mass spectrometer using delayed 
extraction. The dried glycans from each well were dissolved in 15uL of water and 
15 0.5uL spotted on stainless steel sample plates and mixed with 0.5uL of S-DHB 
matrix (9mg/mL of dihydroxybenzoic acid, lmg/mL of 5-methoxysalicilic acid in 
1:1 water/acetonitrile 0.1% TFA) and allowed to dry. 

[01 88] Ions were generated by irradiation with a pulsed nitrogen laser (337nm) 
with a 4ns pulse time. The instrument was operated in the delayed extraction mode 

20 with a 125ns delay and an accelerating voltage of 20kV. The grid voltage was 

93.00%, guide wire voltage was 0.10%, the internal pressure was less than 5 X 10- 
7 torr, and the low mass gate was 875Da. Spectra were generated from the sum of 
100-200 laser pulses and acquired with a 2 GHz digitizer. Man5 oligosaccharide 
was used as an external molecular weight standard. All spectra were generated 

25 with the instrument in the positive ion mode. The estimated mass accuracy of the 
spectra was 0.5%. 

Materials: 

[0189] MOPS, sodium cacodylate, manganese chloride, UDP-galactose and 
30 CMP-N-acetymeuraminic acid were from Sigma, Saint Louis, MO. Trifluroacetic 
acid (TFA) was from Sigma/Aldrich, Saint Louis, MO. Recombinant rat alpha-2,6- 
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sialyltransferase from Spodoptera frugjperda andbeta-l,4-galactosyltransferase 
from bovine milk were from Calbiochem, San Diego, CA 

/3-N-acetylhexosaminidase Digestion 
5 [0190] The glycans were released and separated from the glycoproteins by a 
modification of a previously reported method (Papac, et al. A. J. S. (1998) 
Glycobiology 8, 445-454). After the proteins were reduced and carooxymethylated, 
and the membranes blocked, the wells were washed three time with water. The 
protein was deglycosylated by the addition of 30 ul of 10 mM NH4HCO3 pH 8.3 
1 0 containing one miUiunit of N-glycanase (Glyko, Novate, CA). After 16 hr at 37°C, 
the solution containing the glycans was removed by centrifugation and evaporated 
to dryness. The glycans were then dried in SC210A speed vac (Thermo Savant, 
Halbrook, NY). The dried glycans were put in 50 mM NH4AC pH 5.0 at 37°C 
overnight and lmU of hexos (Glyko, Novate, CA) was added. 

15 

Galactosyltransferase Reaction 

[0191] Approximately 2mg of protein (r-K3:hPg [PBP6-5]) was purified by 
nickel-affinity chromatography, extensively dialyzed against 0.1% TFA and 
lyophilized to dryness. The protein was redissolved in 1 50uD of 50mM MOPS, 
20 20mM MnC12, pH7.4. After addition of 32.5ug (533nmol) of UDP-galactose and 
4mU of P 1,4-galactosyltransferase, the sample was incubated at 37° C for 18 
hours. The samples were then dialyzed against 0. 1% TFA.for analysis by MALDI- 
TOF mass spectrometry. 

[0192] The spectrum of the protein reacted with galactosyltransferase showed an 
25 increase in mass consistent with the addition of two galactose moieties when 

compared with the spectrum of a similar protein sample incubated without enzyme. 
Protein samples were next reduced, carboxymethylated and deglycosylated with 
PNGase F. The recovered N-glycans were analyzed by MALDI-TOF mass 
spectrometry. The mass of the predominant glycau from the galactosyltransferase 
30 reacted protein was greater than that of the control glycan by a mass consistent 
with the addition of two galactose moieties (325.4 Da). 
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Sialyltransferase Reaction 

[0193] After resuspending the (galactosyltransferase reacted) proteins in lOuL of 
50mM sodium cacodylate buffer pH6.0, 300ug (488nmol) of CMP-N- 
a<*tymeuraminic acid (CMP-NANA) dissolved in 15uL of the same buffer, and 

5 5uL (2mU) of recombinant cc-2,6 sialyltransferase were added. After incubation at 
37°C for 15 hours, an additional 200ug of CMP-NANA and lmU of 
sialyltransferase were added. The protein samples were incubated for an additional 
8 hours and then dialyzed and analyzed by MALDI-TOF-MS as above. 
[0194] The spectrum of the glycoprotein reacted with sialyltransferase showed an 

10 increase in mass when compared with that of the starting material (the protein after 
galactosyltransferase reaction). The N-glycans were released and analyzed as 
above. The increase in mass of the two ion-adducts of me predominant glycan was 
consistent with the addition of two sialic acid residues (580 and 583Da). 

15 EXAMPLE 3 

Identification, cloning and deletion of the 
ALG9 andALG 12 genes in P.pastoris 

[0195] Similar to Example 1, the ALG9p and ALG12 sequences, respectively 
20 from S. cerevisiae, Drosophila melanogaster, Homo sapiens, etc., is aligned and 
regions of high homology are used to design degenerate primers. These primers 
are employed in a PCR reaction on genomic DNA from the P. pastoris. The 
resulting initial PCR product is subcloned, sequenced and used to probe a Southern 
blot of genomic DNA from P. pastoris with high stringency (Sambrook et al., 
25 1989). Hybridization is observed. This Southern blot is used to map the genomic 
loci. Genomic fragments are cloned by digesting genomic DNA and ligating those 
fragments in the appropriate size-range into pUC19 to create a P. pastoris 
subgenomic library. This subgenomic library is transformed into E. coli and 
several hundred clones tested by colony PCR, using primers designed based on the 
30 sequence of the initial PCR product. The clones containing the predicted genes are 
sequenced and open reading frames identified. Primers for construction of an 
a/gP.-.TWI* deletion allele, using a PCR overlap method (Davidson et al., 2002), 
are designed. The resulting deletion allele is transformed into two P.pastoris 
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strains and NAT resistant colonies are selected These colonies are screened by 
PCR and transformants obtained in which the ALG9 ORF is replaced with the 
ochlr.NAT* mutant allele. See generally, Cipollo et al. Glycobiology 2002 
(12)11:749-762; Chanlxet et al. J. Biol Chem. Jul. 12, 2002 (277)28:25815-25822; 
5 Cipollo et al. J. Biol. Chem. Feb. 11, 2000 (275)6:4267-4277; Burda et al. Proc. 
Natl Acad. Sci. USA. My 1996 (93):7160-7165; Karaoglu et al. Biochemistry 
2001, 40, 12193-12206; Grinune et al. J. Biol Chem. July 20, 2001 
(276)29:27731-27739; Verostek et al. J. Biol Chem. June 5, 1993 (268)16:12095- 
12103; Huffaker et al. Proa Natl Acad. Sci. U.S.A. Dec. 1983 (80):7466-7470. 

10 

EXAMPLE 4 

Identification, cloning and expression of Alpha 1,2-3 Mannosidase From 

Xanthomonas Manihotis 

15 

[0196] The alpha 1 ,2-3 Mannosidase from Xanthomonas Manihotis has two 
activities: an alpha- 1,2 and an alpha- 1,3 mannosidase. The methods of the 
invention may also use two independent mannosidases having these activities, 
which may be similarly identified and cloned from a selected' organism of interest. 

20 [0197] As described by Landry et al., alpha-mannosidases can be purified from 
Xanthomonas sp. 9 such as Xanthomonas manihotis. X. manihotis can be purchased 
from the American Type Culture Collection (ATCC catalog number 49764) 
(Xanthomonas axonopodis Starr and Garces pathovar manihotis deposited as 
Xanthomonas manihotis (Arthaud-Berthet) Starr). Enzymes are purified from 

25 crude cell-extracts as previously described (Wong-Madden, S.T. and Landry, D. 
(1995) Purification and characterization of novel glycosidases from the bacterial 
genus Xanthomonas; and Landry, D. US Patent US 6,300,113 Bl Isolation and 
composition of novel Glycosidases). After purification of the mannosidase, one of 
several methods are used to obtain peptide sequence tags (see, e.g., W. Quadroni 

30 M et al. (2000). A method for the chemical generation of N-terminal peptide 
sequence tags for rapid protein identification. Anal Chem (2000) Mar 
1;72(5):1006-14; Wilkins MR et al. Rapid protein identification using N-terminal 
"sequence tag" and amino acid analysis. Biochem Biophys Res Commun. (1996) 
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Apr 25;221(3):609-13; and Tsugita A. (1987) Developments in protein 
miczosequmcwg. Adv Biophys (1987) 23:81-113). 

[0198] Sequence tags generated using a method above are then used to generate 
sets of degenerate primers using methods well-known to the skilled worker. 

5 Degenerate primers are used to prime DNA amplification in polymerase chain 
reactions (e.g., using Taq polymerase kits according to manufacturers' 
instructions) to amplify DNA fragments. The amplified DNA fragments are used 
as probes to isolate DNA molecules comprising the gene encoding a desired 
mannosidase, e.g., using standard Southern DNA hybridization techniques to 

10 identify and isolate (clone) genomic pieces encoding the enzyme of interest The 
genomic DNA molecules are sequenced and putative open reading frames and 
coding sequences are identified. A suitable expression construct encoding for the 
glycosidase of interest can then be generated using methods described herein and 
well-known in the art. 

1 5 [0199] Nucleic acid fragments comprising sequences encoding alpha 1 ,2-3 
mannosidase activity (or catalytically active fragments thereof) are cloned into 
appropriate expression vectors for expression, and preferably targeted expression, 
of these activities in an appropriate host cell according to the methods set forth 
herein. 



20 



EXAMPLE 5 

Identification, cloning and expression of the ALG6 gene in P.pastoris 

[0200] Similar to Example 1 , the ALG6p sequences from S. cerevisiae, 
Drosophila melanogaster, Homo sapiens etc., are aligned and regions of high 

25 homology are used to design degenerate primers. These primers are employed in a 
PCR reaction on genomic DNA from the P. pastoris. The resulting initial PCR 
product is subcloned, sequenced and used to probe a Southern blot of genomic 
DNA from P. pastoris with high stringency (Sambrook et al, 1989). Hybridization 
is observed. This Southern blot is used to map the genomic loci. Genomic 

30 fragments are cloned by digesting genomic DNA and Ugating those fragments in 
the appropriate size-range into pUC19 to create a P. pastoris subgenomic library. 
This subgenomic library is transformed into E. coli and several hundred clones are 
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tested by colony PCR, using primers designed based on the sequence of the initial 
PCR product The clones containing the predicted genes are sequenced and open 
reading frames identified. Primers for construction of an a/go'.-.iMr* deletion 
allele, using a PCR overlap melhod (Davidson et al, 2002), are designed and the 
5 resulting deletion allele is transformed into two P. pastoris strains and NAT 

resistant colonies selected. These colonies are screened by PCR and transformants 
are obtained in which the ALG6 ORF is replaced with the ochl : :NA1* mutant 
allele. See, e.g., hnbach et al. Proc. Natl. Acad. Sci. U.S.A. June 1999 (96)6982- 
6987. 

10 [0201] Nucleic acid fragments comprising sequences encoding Alg6p (or 
catalytically active fragments thereof) are cloned into appropriate expression 
vectors for expression, and preferably targeted expression, of these activities in an 
appropriate host cell according to the methods set forth herein. The cloned ALG6 
gene can be brought under the control of any suitable promoter to achieve 

15 overexpression. Even expression of the gene under the control of its own promoter 
is possible. Expression from multicopy plasmids will generate high levels of 
expression ("overexpression"). 



EXAMPLE 6 

20 Cloning and Expression Of GnT m To Produce 

Bisecting GlcNAcs Which Boost Antibody Functionality 

A. Background 

[0202] The addition of an N-acetylglucosamine to the GlcNAc2Man 3 GlcNAc2 
25 structure by N-acelylglucosaminyltransferases TU yields a so-called bisected N- 
glycan (see Figure 3). This structure has been implicated in greater antibody- 
dependent cellular cytotoxicity (ADCC) (Umana et al. 1999). 
[0203] A host cell such as a yeast strain capable of producing glycoproteins with 
bisected N-glycaus is engineered according to the invention, by introducing into 
30 the host cell a GnTTJI activity. Preferably, the host cell is transformed with a 

nucleic acid that encodes GnTDI (e.g., a mammalian such as the murine GnT IDE 
shown in Fig. 32) or a domain thereof having enzymatic activity, optionally fused 
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to a heterologous cell signal targeting peptide (e.g., using the libraries and 
associated methods of the invention-) 

[0204] IgGs consist of two heavy-chains (V* C H 1 , C H 2 and C H 3 in Figure 30), 
interconnected in the hinge region through three disulfide bridges, and two light 
5 chains (V u C L in Figure 30). The light chains (domains V L and C L ) are linked by 
another disulfide bridge to the C H 1 portion of the heavy chain and together with fee 
C H 1 and V H fragment make up the so-called Fab region. Antigens bind to the 
terminal portion of the Fab region. The Fc region of IgGs consists of fee C H 3, the 
C H 2 and fee hinge region and is responsible for fee exertion of so-called effector 

1 0 functions (see below). 

[0205] The primary function of antibodies is binding to an antigen. However, 
unless binding to fee antigen directly inactivates fee antigen (such as in fee case of 
bacterial toxins), mere binding is meaningless unless so-called effector-functions 
are triggered. Antibodies of the IgG subclass exert two major effector-functions: 

15 fee activation of fee complement system and induction of phagocytosis. The 
complement system consists of a complex group of serum proteins involved in 
controlling inflammatory events, in fee activation of phagocytes and in fee lytical 
destruction of cell membranes. Complement activation starts wife binding of the 
CI complex to fee Fc portion of two IgGs in close proximity. CI consists of one 

20 molecule, Clq, and two molecules, Clr and Cls. Phagocytosis is initiated through 
an interaction between fee IgG's Fc fragment and Fc-gamma-receptors (FcyRI, E 
and HI in Figure 30). Fc receptors are primarily expressed on fee surface of 
effector cells of fee immune system, in particular macrophages, monocytes, 
myeloid cells and dendritic cells. 

25 [0206] The C H 2 portion harbors a conserved N-glycosylation site at asparagine 
297 (Asp297). The Asp297 N-glycans are highly heterogeneous and are known to 
affect Fc receptor binding and complement activation. Only a minority (i.e., about 
15-20%) of IgGs bears a disialylated, and 3-10% have a monosialylated N-glycan 
(reviewed in Jefferis, R., Glycosylation of human IgG Antibodies. BioPhann, 

30 2001). Interestingly, the nunimal N-glycan structure shown to be necessary for 
fully functional antibodies capable of complement activation and Fc receptor 
binding is apentasacharide with terminal N-acetylgluwsamine residues 
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(GlcNAc 2 Man 3 ) (reviewed in Jefferis, R-, Glycosylation of human IgG Antibodies. 
BioPharm, 2001). Antibodies with less than a GlcNAc 2 Man 3 N-glycan or no N- 
glycosylation at Asp297 might still be able to bind an antigen but most likely will 
not activate the crucial downstream events such as phagocytosis and complement 
5 activation, hi addition, antibodies with fungal-type N-glycans attached to Asp297 
will in all likelihood solicit an immune-response in a mammalian organism which 
will render that antibody useless as a therapeutic glycoprotein. 

B. Cloning And Expression Of GnTm 

10 The DNA fragment encoding part of the mouse GnTHI protein lacking the TM 

domain is PCR amplified from murine (or other mammalian) genomic DNA using 
forward 5 9 -TCCTGGCGCGCCTTCCCGAGAGAACTGGCCTCCCTC-3' and 
5 9 -AATTAATTAACCCTAGCCCTCCGCTGTATCC AACTTG-3 ' reversed 
primers. Those primers include AscI and Pad restriction sites that will be uSed for 

1 5 cloning into the vector suitable for the fusion with leader library. 

The nucleic acid and amino acid sequence of murine GnTDI is shown in Fig. 32. 

C Cloning of immunoglobnlin encoding sequences 

[02071 P rotocols for the cloning of the variable regions of antibodies, including 
20 primer sequences, have been published previously. Sources of antibodies and 

encoding genes can be, among others, in vitro immunized human B cells (see, e.g., 

Borreback, OA. et al. (1988) Proa Natl Acad. Set USA 85, 3995-3999), periphal 

blood lymphocytes or single human B cells (see, e.g., Lagerkvist, A.C. et al. 

(1995) Biotechniques 18, 862-869; and Terness, P. et al. (1997) Hum. Immunol 56, 
25 17-27) and transgenic mice containing human immunoglobulin loci, allowing the 

creation of hybridoma cell-lines. 

[0208] Using standard recombinant DNA techniques, antibody-encoding nucleic 
acid sequences can be cloned Sources for the genetic information encoding 
immunoglobulins of interest are typically total RNA preparations from cells of 
30 interest, such as blood lymphocytes or hybridoma cell lines. For example, by 
employing a PCR based protocol with specific primers, variable regions can be 
cloned via reverse transcription initiated from a sequence-specific primer 
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hybridizing to the IgG C H 1 domain site and a second primer encoding amino acids 
111-118 of the murine kappa, constant region. The V H and V K encodingcDNAs 
will then be amplified as previously published (see, e.g., Graziano, R.F. et al. 
(1995) J Immunol. 155(10): p. 4996-5002; Welschof, M. et al. (1995) J. Immunol. 

5 Methods 179, 203-214; and Orlandi, R. et al. (1988) Proc. Natl. Acad. Set USA 86: 
3833). Cloning procedures for whole immunoglobulins (heavy and light chains 
have also been published (see, e.g., Buckel, P. et al. (1987) Gene 51:13-19; 
Recinos A 3 rd et al. (1994) Gene 149: 385-386; (1995) Gene Jun 9;158(2):311-2; 
andRecinos A3 ri et al. (1994) Gene Nov 18;149(2):385-6). Additional protocols 

1 0 for the cloning and generation of antibody fragment and antibody expression 
constructs have been described in Antibody Engineering, R. Kontermann and S. 
Diibel (2001), Editors, Springer Verlag: Berlin Heidelberg New York. 
[0209] Fungal expression plasmids encoding heavy and light chain of 
immunoglobulins have been described (see, e.g., Abdel-Salam, H.A et al. (2001) 

15 Appl Microbiol. Biotechnol. 56: 157-164; and Ogunjimi, AA. et al. (1999) 

Biotechnology Letters 21: 561-567). One can thus generate expression plasmids 
harboring the constant regions of immunoglobulins. To facilitate the cloning of 
variable regions into these expression vectors, suitable restriction sites can be 
placed in close proximity to the tennini of the variable regions. The constant 

20 regions can be constructed in such a way that the variable regions can be easily in- 
frame fused to them by a simple restriction-digest / ligation experiment. Figure 31 
shows a schematic overview of such an expression construct, designed in a very 
modular way, allowing easy exchange of promoters, transcriptional terminators, 
integration targeting domains and even selection markers. 

25 [0210] As shown in Figure 31, V L as well as V H domains of choice can be easily 
cloned in-frame with C L and the C H regions, respectively. Initial integration is 
targeted to the P. pastoris AOX locus (or homologous locus in another fungal cell) 
and the methanol-inducible AOX promoter will drive expression. Alternatively, 
any other desired constitutive or inducible promoter cassette may be used. Thus, if 

30 desired, the 5'AOX and 3'AOX regions as well as transcriptional terminator (TT) 
fragments can be easily replaced with different TT, promoter and integration 
targeting domains to optimize expression. Initially me alpha-factor secretion 
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signal with the standard KEX protease site is employed to facilitate secretion of 
heavy and light chains. The properties of the expression vector may he further 
refined using standard techniques. 

[0211] An Ig expression vector such as the one described above is introduced 
5 into a host cell of the invention that expresses GnTEI, preferably in the Golgi 

apparatus of the host cell. The Ig molecules expressed in such a host cell comprise 
N-glycans having bisecting GlcNAcs. 

EXAMPLE 7 

Cloning and expression of GnT-IV (DDP-GlcNAc:alpha-l,3-D -mannoside 
10 beta-l,4-N-Acetylglucosaminyltransferase IV) arid 

GnT-V (beta 1-6-N-acetylglucosaminyltransferase) 

[0212] GnTTV-encoding cDNAs were isolated from bovine and human cells 
(Mmowa,M.T. et al. (1998)7. Biol. Chem. 273 (19), 11556-11562; and 

15 YoshidaA. et al. (1999) Glycobiology 9 (3), 303-310. The DNA fragments 

encoding full length and a part of the human GnT-IV protein (Figure 33) lacking 

the TM domain are PCR amplified from the cDNA library using forward 

5'-AATGAGATGAGGCTCCGCAATGGAACTG-3', 

5 ' -CTGATTGCTT ATC AACGAGAATTCCTTG-3 ' , and reverse 

20 5 '-TGTTGGTTTCTCAGATGATCAGTTGGTG-3 'primers, respectively. 
The resulting PCR products are cloned and sequenced. 

[0213] Similarly, genes encoding GnT-V protein have been isolated from several 
mammalian species, including mouse. (See, e.g., Alverez, K. et al. Glycobiology 
12 (7), 389-394 (2002)). The DNA fragments encoding full length and a part of 
25 the mouse GnT-V protein (Figure 34) lacking the TM domain are PCR amplified 
from the cDNA library using forward 5 '- 
AGAGAGAGATGGCTTTCTTTTCTCCCTGG-3', 5'- 
AAATCAAGTGGATGAAGGACATGTGGC-3 ', and reverse 
5'-AGCGATGCTATAGGCAGTCTTTGCAGAG-3 'primers, respectively. The 

30 resulting PCR products are cloned and sequenced. 

[0214] Nucleic acid fragments comprising sequences encoding GnT IV or V (or 
catalytically active fragments thereof) are cloned into appropriate expression 
vectors for expression, and preferably targeted expression, of these activities in an 
appropriate host cell according to the methods set forth herein. 
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What is Claimed is : 

1 . A method for producing a human-like glycoprotein in a non-human 
eukaryotic host cell comprising the step of diminishing or depleting the activity of 
one or more enzymes in the host cell that transfers a sugar residue to the 1,6 arm of 

5 a lipid-linked oligosaccharide structure. 

2. The method of claim 1, further comprising the step of introducing into the 
host cell at least one glycosidase activity. 

3 . The method of claim 2, wherein at least one glycosidase activity is a 
mannosidase activity. 

1 0 4. The method of claim 1 , further comprising producing an N-glycan. 

5. The method of claim 4, wherein the N-glycan has a GlcNAcMan x GlcNAc 2 
structure wherein X is 3, 4 or 5. 

6. The method of claim 5, further comprising the step of expressing within the 
host cell one or more enzyme activities, selected from glycosidase and 

15 glycosyltransferase activities, to produce a GlcNAc2Man3GlcNAc 2 structure. 

7. The method of claim 6, wherein the activity is selected from a- 1,2 
mannosidase, a-1,3 mannosidase and GnTII activities. 

8. The method of claim 1, wherein at least one diminished or depleted enzyme 
is selected from the group consisting of an enzyme having dolichyl-P- 

20 Man:Man 5 GlcNAc 2 -PP-dolichyl alpha- 1,3 mannosyltransferase activity; an 
enzyme having dohchyl-P-Man:Man6GlcNAc 2 -PP-dohchyl alpha-1,2 
mannosyltransferase activity and an enzyme having dolichyl-P- 
Man:Man 7 GlcNAc 2 -PP-dohchyl alpha-1,6 mannosyltransferase activity. 



■ ( ■ ( ' 

WO 03/056914 PCT/US02/41510 



9. The method of claim 1 , wherein the tuminished or depleted enzyme has 
25 doUchyl-P-Man:MaB 5 GlcNAc2-PP-dohchyl alpha-1,3 mannosyltxansferase 

activity. 

1 0. The method of claim 1 , wherein the enzyme is diminished or depleted by 
mutation of a host cell gene encoding the enzymatic activity. 

1 1. The method of claim 10, wherein the mutation is a partial or total deletion 
30 of a host cell gene encoding the enzymatic activity. 

12. The method of claim 1, wherein the glycoprotein comprises AT-glycans 
having seven or fewer mannose residues. 

13. The method of claim 1, wherein the glycoprotein comprises tf-glycans 
having three or fewer mannose residues. 

35 • 14. The method of claim 1, wherein the glycoprotein comprises one or more 
sugars selected from the group consisting of galactose, GlcNAc, sialic acid, and 
fucose. 

15. The method of claim 1, wherein the glycoprotein comprises at least one 
oligosaccharide branch comprising the structure NeuNAc-Gal-GlcNAc-Man 

40 16. The method of claim 1 , wherein the host is a lower eukaryotic cell. 

17. The method of claim 1, wherein the host cell is selected from the group 
consisting of Pichia pastoris, Pichia finlandica, Pichia trehalophila, Pichia 
koclamae, Pichia membranaefaciens, Pichia opuntiae, Pichia thermotolerans, 
Pichia salictaria, Pichia guercuum, Pichia pijperi, Pichia stiptis, Pichia 

45 methanolica, Pichia sp., Saccharomyces cerevisiae, Saccharomyces sp., Hansenula 
polymorpha, Kluyveromyces sp., Candida albicans, Aspergillus nidulans, 
Aspergillus niger, Aspergillus oryzae, Trichoderma reesei, Chrysosporium 
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lucknowense, Fusarium sp., Fusarium gramineum, Fusarium venenatum and 
Neurospora crassa. 

50 18. The method of claim 1 , wherein the host cell is further deficient in 
expression of initiating a-1,6 mannosyltransferase activity. 

19. The method of claim 18, wherein the host cell is an OCH1 mutant of P. 
pastoris. 

20. The method of claim 1, wherein the host cell expresses GnTI and UDP- 
55 GlcNAc transporter activities. 

21. The method of claim 1, wherein the host cell expresses a UDP- or GDP- 
specific diphosphatase activity. 

22. The method of claim 1 , further comprising the step of isolating the 
glycoprotein from the host 

60 23. The method of claim 22, further comprising the step of subjecting the 
isolated glycoprotein to at least one further glycosylation reaction in vitro, 
subsequent to its isolation from the host. 

24. The method of claim 1 , further comprising the step of introducing into the 
host a nucleic acid molecule encoding one or more enzymes involved in the 

65 production of GlcNAcMan 3 GlcNAc 2 or GlcNAc 2 Man 3 GlcNAc 2 . 

25. The method of claim 24, wherein at least one of the enzymes has 
mannosidase activity. 

26. The method of claim 25, wherein the enzyme has an a-l ,2-mannosidase 
activity and is derived from mouse, human, Lepidoptera, Aspergillus nidulans, C 

70 elegans, D. melanogaster, or Bacillus sp. 
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27. The method of claim 25, wherein the enzyme has an ot-l,3-mannosidase 
activity. 

28. The method of claim 24, wherein at least one enzyme has 
glycosyltransferase activity. 

75 29. The method of claim 28, wherein the glycosyltransferase activity is selected 
from the group consisting of GnTT and GriTO. 

30. The method of claim 24, wherein at least one enzyme is localized by 
forming a fusion protein between a catalytic domain of the enzyme and a cellular 
targeting signal peptide. 
80 31. The method of claim 30, wherein the fusion protein is encoded by at least 
one genetic construct formed by the in-frame ligation of a DNA fragment encoding 
a cellular targeting signal peptide with a DNA fragment encoding a glycosylate 
enzyme or catalytically active fragment thereof. 

32. The method of claim 31, wherein the encoded targeting signal peptide is 
85 derived from a member of the group consisting of mannosyltransferases, 

diphosphotases, proteases, GnT I, GnT H, GnT HI, GnT IV, GnT V, GnT VI, 
GalT, FT, and ST. 

33 . The method of claim 3 1, wherein the catalytic domain encodes a 
glycosidase or glycosyltransferase that is derived from a member of the group 

90 consisting of GnT I, GnT E, GnT HI, GnT IV, GnT V, GnT VI, GalT, 

Fucosyltransferase and ST, and wherein the catalytic domain has a pH optimum 
within 1 .4 pH units of the average pH optimum of other representative enzymes in 
the organelle in which the enzyme is localized, or has optimal activity at a pH 
between 5.1 and 8.0. 
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95 34. The method of claim 3 1, wherein the nucleic acid molecule encodes one or 
more enzymes selected from the group consisting of UDP-GlcNAc transferase, 
UDP-galactosyltransferase, GDP-focosyltransferase, CMP-sialyltransferase, UDP- 
GlcNAc transporter, UDP-galactose transporter, GDP-fucose transporter, CMP- 
sialic acid transporter, and nucleotide diphosphatases. 

100 35. The method of claim 31, wherein the host expresses GnTI and UDP- 
GlcNAc transporter activities. 

36. The method of claim 31, wherein the host expresses a UDP- or GDP- 
specific diphosphatase activity. 

37. The method of claim 1, further comprising the step of introducing into a 
105 host that is deficient in doUchyl-P-Mau:Man5GlcNAc2-PP-doUchyl alpha-1,3 

mannosyltransferase activity a nucleic acid molecule encoding one or more 
enzymes for production of a GlcNAcMan4GlcNAc 2 carbohydrate structure. 

38. The method of claim 1, further comprising the step of introducing into a 
host that is deficient in doUchyl-P-Man:Man6GlcNAc2-PP-dohchyl alpha-1,2 

1 10 mannosyltransferase or dohchyl-P-MamMan7GlcNAc2-PP-dohchyl alpha-1 ,6 
mannosyltransferase activity a nucleic acid molecule encoding one or more 
enzymes for production of a GlcNAcMan4GlcNAc 2 carbohydrate structure. 

39. The method of claim 37 or 3 8, wherein the nucleic acid molecule encodes 
at least one enzyme selected from the group consisting of an oe-1,2 mannosidase, 

115 UDP GlcNAc transporter and GnTI . 

40. The method of claim 39, further comprising the step of introducing into the 
deficient host cell a nucleic acid molecule encoding an a- 1,3 or an a-l,2/a-l,3 
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mannosidase activity for the conversion of the GlcNAciMaruGlcNAc 2 structure to 
a GlcNAciMan 3 G!cNAc 2 structure. 
120 41. The method of claim 1, further comprising the step of introducing into the 
host a nucleic acid molecule encoding one or more enzymes for production of a 
GlcNAc2Man 3 GlcNAc 2 carbohydrate structure. 

42. The method of claim 41 , wherein at least one enzyme is GnTTL 

43. The method of claim 1, further comprising the step of introducing into the 
125 host cell at least one nucleic acid molecule encoding at least one mammalian 

glycosylation enzyme selected from the group consisting of a glycosyltransferase, 
fucosyltransferase, glactosyltransferase, N-acetylgalactosaminyltransferase, N- 
acetylglycosaminyltransferase and sulfotransferase. 

44. The method of claim 1, comprising the step of transforming host cells with 
130 a DNA library to produce a genetically mixed cell population expressing at least 

one glycosylation enzyme derived from the library, wherein the library comprises 
at least two different genetic constructs, at least one of which comprises a DNA 
fragment encoding a cellular targeting signal peptide ligated in-frame with a DNA 
fragment encoding a glycosylation enzyme or catalytically active fragment thereof. 
135 45. A host cell produced by the method of claim 1 or 44. 

46. A human-like glycoprotein produced by the method of claim 1 or 44. 

47. A nucleic acid molecule comprising or consisting of at least forty-fiye 
consecutive nucleotide residues of Fig. 6 (P. pastoris ALG 3 gene). 

48. A vector comprising a nucleic acid molecule of claim 47. 
140 49. A host cell comprising a nucleic acid molecule of claim 47. 
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50. A P.pastoris cell in which the sequences of Fig. 6 (P. pastoris ALG 3 
gene), are mutated whereby the glycosylation pattern of the cell is altered. 

51. A method to enhance the degree of glucosylation of lipid-linked 
oligosaccharides comprising the step of increasing alpha- 1,3 glucosyltransfer^se 

145 activity in a host cell. 

52. A method to enhance the degree of glucosylation of lipid-linked 
oligosaccharides comprising decreasing the substrate specificity of oligosaccharyl 
transferase activity in a host cell. 

53 . A method for producing in a non-mammalian host cell an immunoglobulin 
150 polypeptide having an N-glycan comprising a bisecting GlcNAc, the method 

comprising the step of expressing in the host cell a GnTIH activity. 

54. A non-mammalian host cell that produces an immunoglobulin having an N- 
glycan comprising a bisecting GlcNAc. 

55. An immu noglob ulin produced by the host cell of claim 54. 

155 56. A method for producing in a non-human host cell a polypeptide having an 
N-glycan comprising a bisecting GlcNAc, the method comprising the step of 
expressing in the host cell a GnTIII activity. 

57. A non-human host cell that produces a polypeptide having an N-glycan 
comprising a bisecting GlcNAc. 
1 60 58. A polypeptide produced by the host cell of claim 57. 

59. A method for producing a human-like glycoprotein in a non-human 
eukaryotic host cell comprising the step of diminishing or depleting from the host 



WO 03/056914 PCT/US02/41510 

cell an alg gene activity and introducing into the host cell at least one glycosidase 
activity. 

165 60. A method for producing a human-Eke glycoprotein having an N-glycan 
comprising at least two GlcNAcs attached to a trirnannose core. 
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ALG3 Blast 05-22-01 



Sequences producing significant alignments: 



gi 
gi 
gi 

gi 
gi 



(bits) Value 



0.0 



586444|sp]P38179|ALG3_YEAST TO LICHYL-P-MAN:MAN (5 J GLCNAC ( . . . 797 

ALG3 HUMAN DOLICHYL-P-MAN:MAN (5) GLCNAC . . . 173 7e-43 

NT56~DROVI LETHAL ( 2 ) NEIGHBOUR OF TID P. . .145 3e-34 

3024222 sp ga-M« NT56_DROME LETHAL (2) NEIGHBOUR OF TID P ... 121 3e-27 

10720153 Ispl P82149 |NT53 DROME LETHAL (2) NEIGHBOUR OF TID ... 121 5e-27 

1707982|S P |P40989|GLS2^?EAST 1 , 3-BETA-GLUCAN SYNTOASE CO . . . 32 2.8 

1346146 sp P38631 GLS1 YEAST 1 , 3~BETA U GLUCAN SYNTHASE CO... 31 6.6 



3024226 
3024221 
3024222 



sp 
sp 
sp 



Q92685 
Q24332 
Q27333 



Alignments 

Yeast 



>gi 1 586444 1 sp I P38179 1 ALG3_YEAST DOLICHYL-P- 
3V^-MAN(5)GLCNAC(2)-PP-DOLICHYL MANNOSYLTRANSFERASE 

(DOL-P-MAN DEPENDENT ALPHA (1-3) -MANNOSYLTRANSFERASE) 
(HM-1 KILLER TOXIN RESISTANCE PROTEIN) 
Length = 458 

Score = 797 bits (2059), Expect =0.0 

Identities = 422/458 (92%) , Positives = 422/458 (92%) 

Ouerv • • 1 MEGEQSPQGEKSLQRKQFVRPPLDLWQDLKDGVRYVT FDCRANLI VMPLLILFESMLCKI 60 

MEGEQS PQGEKS LQRKQFVRPPLDLWQDLKDGVRYVT FDCRANLIVMPLLI LFESMLCKI 
Sbjct: 1 MEGEQS PQGEKS LQRKQFVRPPLDLWQDLKDGVRYVI FDCRANLIVMPLLI LFESMLCKI 60 

Ouerv- 61 1 1 KKVAYTEIDYKAYMEQIEMIQLDGMLDYSQVSGGTGPLVYPAGHV^ 120 

1 1 KKVAYTE I D YKAYMEQI EMI QliDGMLD YS QVS GGTGPLVY PAGHVLI YKMMYWLTEGM 
Sbjct: 61 IIKKVAYTEIDYKAYMEQIEMIQLDG^YSQVSGGTG^ 120 

Ouerv 121 DHVERGQVF FR YL YIiLTLALQMACYYLLHLP P WCWLACLS KRLHS I YVLRLFNDCFTTL 180 

" DHVERGQVFFRYLYLLTIjALQMACYYLLHLPPWCVVIA^ KRLHS I YVLRLFNDCFTTL 

Sbjct: 121 DHVERGQVF FRYLYLLTLALQMACrrYIiIi^ KRLHS I YVLRLFNDCFTTL 180 

Ouerv 181 FMVvTVIXSAIVASRCHQRPKLKK^ 240 

* PJ4VVTVLGATVASRCHQRPKLKKSLALVI SATYSMAVS I KMNAJjLYFPAMMISLFILNDA 

Sbjct: 1B1 FMVVTVLGAI VASRCHQRPKLKKSLALVT SATYSMAVS I KMNALL YF P AMMI SLFI LNDA 240 

Ouerv- 241 NVILTLLDLVAMIAWQVAVAVPFLRSFPQQYIiHGAI^ 300 

NVT LTLLDLVAMI AWQVAVAVP FLRS FPQQ YLHCAFNFGRKFMYQWS INWQMMDEEAFND 
Sbjct: 241 NVT LTLLDLVAMIAWQVAVAVP FLRS FPQQ YLHCAFNFGRKFMYQWS I NWQMMDEEAFND 300 

Ouerv- 301 KRFXXXXXXXXXXXXXXXF^ 360 

y ' FVTRYPRI LPDLWS SLCHPLRKNAVLNANPAKTI PFVLIASN 

Sbjct: 301 KRFHLALLISHLIALTTLFVTRYPRILPDLWS SLCHPLRK^ PFVLIASN 360 

Query- 361 FIGVLFSRSLHYQFLSW^fHWTLPILIFWSGMPFFVGPIWYVLHEW 420 

FIGVLFSRSLHYQFLSWYHWTLPILIFWSGMPFFVGPIWYVIJIEWCWNSYPPNSQ 
Sbjct: 361 FIG^FSRSLHYQFLSWYHVHTLPILIFWSGMPFFVGP IWYVLHEWCWNSYPPNSQASTLL 420 

Query: 421 XXXXXXXXXXXXXXXXSGSVALAKSHLRTTSSMEKKLN 458 

SGSVALAKSHLRTTSSMEKKLN 
Sbjct: 421 LALNTVLLLLLALTQLSGSVALAKSHLRT^ 458 
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Human 

>gi 1 3024226 | sp| Q92685 | ALG3.HUMAN DOLICHSrL-P-MM^MMJ (S)GLCNAC (2) -PP-.DOLICHYL 
MAHUOSYIiTRANSFERASE^ ( 1 _ 3 ) -MAKNOSYLTRANSFEEASE ) 

{NOT56-IiIKE PROTEIN) 
Length = 438 

Sbjct: 29 WQER- - - - RLLLRE P R YTLLVAACLCLAEVG I T FWVIHR VAYTE I DWKAYMAEVEGV - IN 83 

Query- 86 GhtbDYSQVSGGTGPLVYPA^ 145 

G DY+Q+ G TGPLVYPAG V 1+ +Y+T + Q F LYL TL L Y 

Sbjct: 84 GTYD YTQLQGDTGPL VYPAGFVYI FMGLYYATSRGTDIRMAQNT FAVL YLATLLLVFL I Y 143 

0uerv . 146 y - LLHLP PW C - WLACLS KRLHS I YVLRLFNDCFTTLFMVvTVLGAJC VASRCHQRPKLKK 203 
Query. x + c g R+HSI+VLRIiKND + + +L + qr - ■ 

Sbjct: 144 HQTCKVPPFVFFFMCCAS YRVHS I FVLRLFNDP VAMVLLFLS INLLLAQRWGWG- 197 

Ouerv 204 S LALVI S ATYSMAVS I KMNALLYFP AMMI S LFI LNDANVT LTLLDLVAMI AWQVAVAVP F 263 

+S+AVS+KMN LL4- P +4- Ii L L + A + QV + +PF 

Sbjct: 198 CCFFSLAVSVKMNVLliFAPGLLFLLLTQFGFRGALPKLGICAGL- - QWLGLPF 249 

Query- 264 LRS FPQQYLHCAFNFGRKFMYQWS INWQMMDEEAFtTOKRFXXXXXXXXXXXXXXXFVTRY 323 

L P YL +F+ GR+F++ W++NW+ + E F + F + R+ 

Sbjct: 250 LLENPSGYLSRSroLGRQFLFHWTVNWR^ 309 

nuarv- 324 PRILPDLWSSLCHPIJIKNAVI.NANPAm 383 
Query. * + s L P ++ I L SNFIG+ FSRSLHYQF WY TLP 

Sbjct: 310 HRTGESILSLIJttPSKRKVPPQPLTPN^^ 369 

Ouerv 384 ILIF WSGMPFFVGPIWYVLHEWCWNSYPPNS 414 

L++ W + + + E WN+YP S 

Sbjct: 370 YLLWAMPARWLTHLLRLLVLGLI - -ELSWNTYPSTS 403 

Drosophila Vi 

>gi|302422l|sp|Q24332|NT56_DROVI LETHAL { 2 ) NEIGHBOUR OF TID PROTEIN (NOT58) 
Length = 526 

Score = 145 bits (366) , Expect = 3e-34 

Identities - 103/273 (37%), Positives = 157/273 (56%), Gaps - 17/273 (6%) 

Ouerv- 33 VTIYVIFDCRANLI VMPLLILFESMLCKI I IKKVAYTEIDYKAYMEQIEMIQLDGMLDYSQ 92 

" ++Y+ F+ A IV L++L E+++ ++I++V YTEID+KAYM++ E L+G +YS 

Sbjct: 34 I KYLAFE P AALPI VS VL IVLAEAVINVL VT QRVP YTE I DWKAYMQECEGF - LNGTTNYSL 92 

Ouerv- 93 VSGGTGPLVYPAGHVLIYKMMYWLTEGMDHV^RGQW 151 

+ G TGPLVYPA V IY +Y+LT +V Q F +YLL + L + Y +P 
Sbjct: 93 I^GDTGPLVYPAAFVYIYSGLYYLTGQGTNVRIAQYIFACIYLI^M 152 

Query- 152 PWCVVLACL-SKRLHSIYVLRLFNDCFTTLFM 210 

P+ +VL+ S R+HS I YVLRLFND L +L A + QR L S 
Sbjct: 153 PYVLVLSAFTS YRIHS I YVLRLFND PVAIL LLYAALNLFLDQRWTLG- S 200 

Ouerv 211 ATYSMAVS IKMNALLYTPAMMISLFILNDANVT LTLLDLVAMI AWQVAVAVPFLRS FPQQ 270 

YS+AV +KMN + A f LF L + V+ TL+ L Q+ + PFLR+ P + 

Sbjct- 201 I CYSLAVGVKMN- - 1 LLFAPALLLFYLANLGVLRTLVQLTI CAVLQLFIGAP FLRTHPME 258 
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Query: 271 YLHCAFNFGRKFMYQWS INWQMMDEEAFNDKRF 303 

YL +F+ GR F ++W++N++ + +E F + F 
Sbjct: 259 YLRG S FDLGRI FEHKWTVNYRFLS KELFEQRE F 291 

Score =53.3 bits (127), Expect = le-06 

Identities = 31/62 (50%), Positives = 41/62 (66%), Gaps = 6/62 (9%) 

Query- 352 IPFVLIASOTIC^FSRSLHY^^ 409 

+PF L NFIGV +RSLHYQF WY +LP L+ WS P+ +G + +L E+CWN+ 
Sbjct: 412 LPFFL--CNFIGVACARSLHYQFYIWYFHSLPY^ 467 

Query: 410 YP 411 
YP 

Sbjct: 468 YP 469 

Drosophila melanogaster 

>gi| 3024222 |sp|Q27333|NT56_DROME LETHAL ( 2 ) NEIGHBOUR OF TID PROTEIN (NOT56) 
(NOT45) 

Length = 510 
Score = 121 bits (305), Expect = 3e-27 

Identities = 96/272 (35%) , Positives = 154/272 (56%), Gaps = 17/272 (6%) 

Query- 34 RYVI FD CRANLIVMPLLI LFE SMLCKI 1 1 KKVAYTE IDYKAYMEQ I EMI QLDGMLD YSQV 93 

+Y++ + A IV ++L E ++ ++I++V YTEID+ AYM++ E L+G +YS + 
Sbjct: 36 KYLLLEPAALPIVGLKVLLiAELVTNVVV^ 94 

Query: 94 SGGTGPLVYPAGHVLIYKMMYVTLTEGMDHVERGQW 152 

G TGPLVYPA V IY +Y++T +V Q F +YLL LAL + Y +PP 
Sbjct: 95 RGDTGPLVYPAAFVYIYSALYYVTSHGTNVRIAQYIF 154 

Query: 153 WCVVIACL- SKRLHSI YVLRLFNDCFTTLFMVVTVI^ SA 211 

+ +VL+ S R+HSIYVLRLFND + V +L A + +R L S 
Sbjct: 155 YVLVLSAFTSYRIHSIYVLRLFNDP VAVLLLYAALNLFLDRRWTLG ST 202 

Query: 212 TYSMAVS I KMNALLYFPAMMI SLFILNDANVILTLLDLVAMI AWQ VAVAVP FLRS FPQQY 271 

+S+AV +KMN + A + LF L + ++ T+L L Q+ + PFL + P +Y 

Sbjct: 203 FF SLAVGVKMN - - 1 LL FAPALLLFYLANLGLLRT I LQLAVCGVI QLLLGAP FLLTHPVEY 260 

Query: 272 LHCAFNFGRKFMYQWS INWQMMDEEAFNDKRF 303 

L +F+ GR F ++W++N++ + + F ++ F 
Sbjct: 261 LRG S FDLGRI FEHKWTVNYRFLS RDVFENRTF 292 

Score =49.4 bits (117), Expect = 2e-05 

Identities = 27/60 (45%), Positives = 35/60 (58%), Gaps m 2/60 (3%) 

Query: 352 IPFVLIASNFIGVLFSRSLHYQFLSWYHWTLPILIFWSGMPFFTO^ 411 

+PF L N +GV SRSLHYQF WY +LP L + + V + L E+CWN+YP 

Sbjct: 407 LP FFL - - CNLVGVACSRS LHYQFYVWYFHS LP YIiAWSTP YS LGVRCL I LGL I E YCWNTYP 464 
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Matrix: BL05UM62 

Gap Penalties: Existence: 11, Extension: 1 

Number of Hits to DB : 28883317 

Number of Sequences: 96469 

Number of extensions: 1107545 

Number of successful extensions: 2870 

Number of sequences better than 10.0: 16 

Number of HSP's better than 10.0 without gapping: 5 

Number of HSP's successfully gapped in prelim testr 11 

Number of HSP's that attempted gapping in prelim test: 2839 

Number of HSP's gapped (non-prelim) : 23 

length of query: 458 
length of database: 35,174,128 
effective HSP length: 45 
effective length of query: 413 
effective length of database: 30,833,023 
effective search space: 12734038499 
effective search space used: 12734038499 
T: 11 
A: 40 

XI: 15 ( 7.1 bits) 
X2: 38 (14.6 bits) 
X3: 64 (24.7 bits) 
SI: 40 (21.8 bits) 
S2: 67 (30.4 bits) 
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FIGURES 

S. cerevisiae ALG3 . 
ATGGAAGGTGAACAGTCTCCGCAAGGTGAAAAGTCTCTGCAAAGGAAGC 

AAITTGTCAGACCTCCGCTGGATCTGTGGCAGGATCTCAAGGACGGTGTG 

CGCTACGTGATCTTCGATTGTAGGGCCAATCTTATCGTTATGCCCCTTTTG 

ATTTTGTTCGAAAGCATGCTGTGCAAGATTATCATTAAGAAGGTAGCTTAC 

ACAGAGATCGATTACAAGGCGTACATGGAGCAGATCGAGATGATTCAGCT 

CGATGGCATGCTGGACTACTCTCAGGTGAGTGGTGGAACGGGCCCGCTGG 

TGTATCCAGCAGGCCACGTCTTGATCTACAAGATGATGTACTGGCTAACA 

GAGGGAATGGACCACGTTGAGCGCGGGCAAGTGTTTTTCAGATACTTGTA 

TCTCCTTACACTGGCGTTACAAATGGCGTGTTACTACCTTTTACATCTACC 

ACCGTGGTGTGTGGTCTTGGCGTGCCTCTCTAAAAGATTGCACTCTATTTA 

CGTGCTACGGTTATTCAATGATTGCTTCACTACITrGTTTATGGTCGTCACG 

GTTTTGGGGGCTATCGTGGCCAGCAGGTGCCATCAGCGCCCCAAATTAAA 

GAAGTCCCTTGCGCTGGTGATCTCCGCAACATACAGTATGGCTGTGAGCA 

TTAAGATGAATGCGCTGTTGTATTTCCCTGCAATGATGATTTCTCTATTCAT 

CCTTAATGACGCGAACGTAATCCTTACTTTGTTGGATCTCGTTGCGATGAT 

TGCATGGCAAGTCGCAGTTGCAGTGCCCTTCCTGCGCAGCTTTCCGCAACA 

GTACCTGCATTGCGCTTTTAATTTCGGCAGGAAGTTTATGTACCAATGGAG 

TATCAATTGGCAAATGATGGATGAAGAGGCTTTCAATGATAAGAGGTTCC 

ACTTGGCCCTTTTAATCAGCCACCTGATAGCGCTCACCACACTGTTCGTCA 

CAAGATACCCTCGCATCCTGCCCGATTTATGGTCTTCCCTGTGCCATCCGC 

TGAGGAAAAATGCAGTGCTCAATGCCAATCCCGCCAAGACTATTCCATTC 

GTTCTAATCGCATCCAACTTCATCGGCGTCCTATTTTCAAGGTCCCTCCAC 

TACCAGTTrCTATCCTGGTATCACTGGACTTTGCCTATACTGATCTTTTGGT 

CGGGAATGCCCTTCTTCGTTGGTCCCATTTGGTACGTCTTGCACGAGTGGT 

GCTGGAATTCCTATCCACCAAACTCACAAGCAAGCACGCTATTGTTGGCA 

TTGAATACTGTTCTGTTGCTTCTATTGGCCTTGACGCAGCTATCTGGTTCGG 

TCGCCCTCGCCAAAAGCCATCTTCGTACCACCAGCTCTATGGAAAAAAAG 

CTCAACTGA 



S. cerevisiae Alg3p 

MEGEQSPQGEKSIX^RKQFVRPPIJ)LWQDLKDGWYVIFDCRAM.IVMPLLIL 

FESMLCKIHKKVAYTEIDYKAYMEQIEMIQLDGMLDYSQVSGGTGPLVTPAG 

HVLIYKMMYWLTEGMDHVERGQVFFRYLYLLTLALQMACYYLLHLPPWCV 

VIACI^KRLHSIYVLRLFM)aTTIJMVWVLG 
ISATYSMAVSIKMNALLYFTAMlVn^ 

WFIJISFPQQYLHCAFOTGRXFMYQWSMWQMMDEEAP1® 
IALTTLFVTRYPPJLPDLWSSIXH^ 

P^IilYQFl^WYHWTLPILIFWSGMPFWGPIWWLHEWCWNSYPPNSQASTL 
LLALNTVLLLLLALTQLSGSVALAKSHLRTTSSMEKKLN 
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FIGURE 6 

P pastoris ALG3 

ATGCCTCCGATAGAGCCAGCTGAAAGGCCAAAGCTTACGCTGAAAAATGT 

TATCGGTGATCTAGTGGCTCTTATTCAAAACGTTTTATTTAACCCAGATTTT 

AGTGTCTTCGTTGCACCTCTTTTATGGTTAGCTGATTCCATTGTTATCAAGG 

TGATCATTGGCACTGTTTCCTACACAGATATTGATTTTTCTTCATATATGCA 

ACAAATCTTTAAAA.TTCGACAAGGAGAATTAGATTATAGCAACATATTTG 

GTGACACCGGTCCATTGGTTTACCCAGCCGGCCATGTTCATGCTTACTCAG 

TACTTTCGTGGTACAGTGATGGTGGAGAAGACGTCAGTTTCGTTCAACAA 

GCATTTGGTTGGTTATACCTAGGTTGCTTGTTACTATCCATCAGCTCCTACT 

1TTTCTCTGGCTTAGGGAAAATACCTCCGGTTTATTTTGTTTTGTTGGTAGC 

GTCCAAGAGACTGCATTCAATATTTGTATTGAGACTCTTCAATGACTGTTT 

AACAACATTTTTGATGTTGGCAACTATAATCATCCTTCAACAAGCAAGTAG 

CTGGAGGAAAGATGGCACAACTATTCCATTATCTGTCCCTGATGCTGCAG 

ATACGTACAGTTTAGCCATCTCTGTAAAGATGAATGCGCTGCTATACCTCC 

CAGCATTCCTACTACTCATATATCTCATTTGTGACGAAAATTTGATTAAAG 

CCTTGGCACCTGTTCTAGTTTTGATATTGGTGCAAGTAGGAGTCGGTTATT 

CGTTCATTTTACCGTTGCACTATGATGATCAGGCAAATGAAATTCGTTCTG 

CCTACTTTAGACAGGCTTTTGACTTTAGTCGCCAATTTCTTTATAAGTGGA 

CGGTTAATTGGCGCTTTTTGAGCCAAGAAACTTTCAACAATGTCCATTTTC 

ACCAGCTCCTGTTTGCTCTCCATATTATTACGTTAGTCTTGTTCATCCTCAA 

GTTCCTCTCTCCTAAAAACATTGGAAAACCGCTTGGTAGATTTGTGTTGGA 

CATTTTCAAATTTTGGAAGCCAACCTTATCTCCAACCAATATTATCAACGA 

CCCAGAAAGAAGCCCAGATTTTGTTTACACCGTCATGGCTACTACCAACTT 

AATAGGGGTGCTTTTTGCAAGATCTTTACACTACCAGTTCCTAAGCTGGTA 

TGCGrTCTCTTTGCCATATCTCCTTTACAAGGCTCGTCTGAACTTTATAGCA 

TCTATTATTGTTTATGCCGCTCACGAGTATTGCTGGTTGGTTTTCCCAGCTA 

CAGAACAAAGTTCCGCGTTGTTGGTATCTATCTTACTACTTATCCTGATTC 

TCATTTTTACCAACGAACAGTTATTTCCTTCTCAATCGGTCCCTGCAGAAA 

AAAAGAATACATAA 



P. pastoris Alg3p 

MPPffiPAEI^KLTLKNVIGDLVALIQNVLF^DFSVFVAPLLWLADSIVIKVnG 

TVSYTDIDFSSYMQQIFKJRQGELDYSNIFGDTGPLVYPAGHVHAYSVI5WYS 

DGGEDVSWQQAFGWLYLGCLLI^ISSYFFSGIGKIPPVYFVLLVASKRLHSIF 

VLRLFMDCLTTFLMLATIIILQQASSWP^GTTIPL^WDAADTYSLAISVKMN 

ALLYLPAFLLLIYLICDENLIKALAPVLVLILVQVGVGYSFILPLHYDDQA^IR 

S AYFRQ AFDFSRQFLYKWTVNWPJFI^QETFNNVOT^ 

LSPKMGKPLGPJ^VLDIFKFWKPTI^PTMIbTDPEP^ 

ARSLHYQFLSWAFSLPYLLYKAM.OTIASIIWAAHEYCWLVFPATEQSSAL 
LVSILLLILILIFrNEQLFPSQSVPAEKKNT 
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P. pastoris ALG3 BLAST 

Sequences producing significant alignments: (bits) Value 



Si 



Si 



Si 



Si 



Si 



Si 



Si 



Si 



Si 



Si 



Si 



Si 



Si 



586444 I splP38179lALG3 YEAST Dolichyl-P-Man:Man (5 ) GlcNAc ( . . , 228 2e-58 



12B02365|qblAAK07848.l|AF309689 10 putative NOT-56 manno. . .212 8e-54 



984725 jgb | AAA75352.ll ORF 1 206 4e-52 



7492702 |pir| |T39084 probable manno syl transferase - f issi. . . 176 8e-43 



16226531 



25367230 



25814791 



17535001 



qblAAIil6193.l|AF42B424 1 At2g47760/F17A22 . 15 [A. . . 164 2e-39 



pir | |B84919 Not56-like protein [imported] - Ara. . . 164 3e-39 



emb|CAB70171.2l Hypothetical protein K09E4 . 2 [C. . .161 2e-38 



ref |NP 496950. l| Putative plasma membrane membr. . . 160 3e~38 
1654000 j emb | CAA70220 . 1 1 Not56-like protein [Homo sapiens ... 155 2e-36 
qb | AAH04313 . 1 1 AAH04313 Unknown (protein for IMA. . . 154 2e-36 



13279206 



22122365 



21292031 



1780792 I emb 



ref |NP 666051. ll hypothetical protein MGC36684 . . . 150 3e-35 
agCP3388 [Anopheles gambiae str . . . . 120 4e-26 
lethal (2) neighbour of tid [Droso. . .114 3e-24 



S* 



EAA04176.1 



CAA71167.1 



Alignments 
S. cerevisiae 
Score - 228 bits (580) , Expect = 2e-58 

Identities = 154/429 (35%), Positives = 229/429 (53%), Gaps = 37/429 (8%) 

Query: 9 RPKLTLKNVI GDLVALI QNVLFNPDFSVFVAPLLWLADS IVTKVII GTVS YTDIDFS S YM 68 

RP L L DL ++ V+F+ ++ V PLL L +S++ K+II V+YT+ID+ +YM 

Sbjct: 20 RPPLDLWQ DLKDGVRYVT FDCRANIilVMPLLILFESMLCKI 1 1 KKVAYTEIDYKAYM 76 

Query: 69 QQI FKIR- QGELDYSNI FGDTGPLVYPAGHVHAYSVLSWY5DGGEDVS FVQQAFGWL YLG 127 

+QI 1+ G LDYS + G TGPIiVYPAGHV Y ++ W ++G + V Q F +LYI* 
Sbjct: 77 EQIEMIQLIX3MLDYSQVSGX3TGPLvYP^ 136 

Query: 128 CLLLSISSYFFSGIjGKIPPvYFVLLVASKRI^^ IILQ 184 

L L ++ Y+ L +PP VL SKRLHSI+VLKLFNDC TT M+ T+ 1+ 
Sbjct: 137 TLAIiQMACYY IJzHLPPWCVVIiACXSKRIiHSIYvIjRLFN^ 193 

Query: 185 QAS SWRKDGTT I PIiS VPDAADTYSIiAI SVKMNXXXXXXXXXXXXXXXOT I KALAPXX 244 

+ K ++ L + + TYS+A+S+KMN D N+I h 

Sbjct: 194 RCHQRPKLKKSIiALVI SATYSMAVS I KMNALLYFPAMMISLFI LNDANVTLTLLDLV 250 

Query: 245 XXXXXXXXXXYSFILPLHYDIWra^ 304 

F+ Y AF+F R+F+Y+W++NW+ + +E FN+ 

Sbjct: 251 AMIAWQVAVAVPFL RSFPQQYIiHCAFNFGRKFMYQWSINWQMMDEEAFNDK 301 

Query: 305 HFHQIiLFAIiHIITL-VXjFILKFLSPKNIGKPL 3 62 

FH L H+I I* LF+ ++ R + D++ L ++N +P ++ 

Sbjct: 3 02 RFHIoALLISHLXALTTLFVTRY PRI LPDLWS S IiCHPLRJQNA VLNANPAKT 351 

Query: 3 63 PDFVTTVMATTNLIGVLFARSIjHYQFLSWYAFSLPYL^ 422 

F V+ +N I GVLF+RS LHYQFLSWY ++LP L++ + + F I Y HE+CW 
Sbjct: 352 IPF VLI ASNFI GVTjFSRSLHYQFLSVnrHWTLP I LI FWSGMPFFVGP I VTYVXiHEWCWN 408 

Query: 423 VFPATEQSS 431 

+P Q+S 
Sbjct: 409 SYPPNSQAS 417 
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Neurospora crassa 

Score = 212 bits (540), Expect = 8e-54 

Identities = 140/400 (35%), Positives = 212/400 (53%), Gaps = 29/400 (7%) 

Query: 35 SWVAPLLWLADS I VTKVI I GTVS YTDIDFSSYMQQ IF 94 

S + P L+L D+++ +11 V YT+ID+++YM+Q+ +1 GE DY+ + G TGPLVYP 
Sbjct: 33 S KL I P PAL FLVDALLCGL I 1 WKVP YTE I DWAAYME QVS Q I LS GERD YTKVRG GTGP L VYP 92 

Query: 95 AGHVHAYSVLSWYSDGGEDVSFVQQAFGWLYl^GCLLLSI SS YFFSGLGKI PPVYFVLLVA 154 

A HV+ Y+ L +D G ++ QQ F LY+ L + + Y+ K PP F LL 

Sbjct: 93 AAHVYIYTGLYHLTDEGI^ILI^QLFAGLYWTI^VVMGCYW QAKAPPYLFPLLTL 149 

Query: 155 SKRLHSiFvTiRLFNDCLTTFlxMLATIIILQQAS 214 

SKRLHSIFVLR FNDC + I Q+ +W+ A Y+L + VK 

Sbjct: 150 SKRLHSIFVLRCFNDCFAVTjFLWLAIFFFQR-RNWQA GALLYTLGLGVK 197 

M + + L F+ HY + Y 
Sbjct: 198 MTLLLSLPAVGIVLFLGSG- S FVTTLQLVATMGLVQILIGVPFL- - AHYPTE Y 247 

Query: 275 FRQAFDFSRQFLYKWTVNWRFLSQETFNNVHFHQLLFALH^ 333 

+AF+ SRQF +KWTVNWRF+ +E F + F L ALH++ L +FI +++ P K 
Sbjct: 248 LSRAFELSRQ FFFK>TTVNWRFVGEEI FLSKGFALTLLALHVLVLGI FI TTRWI KPAR- - K 305 

Query: 334 PLGRFVLD I FKFWKPTLS - PTNI INDPERSPDFVTTVMATTNLI GVLFARSLHYQFLSWY 392 

L + + + KPL+P+ + +p ++ x + + N +G+LFARSLHYQF ++ 

Sbjct: 306 SLVQLISPVLLAGKPPLWPEHRAAARDVTPRYIMTTILSANAVGLLFARSLHYQFYAYV 365 

Query: 393 AFSLPYLLYKARLNFI AS I IVYAAHE YCWLVFPATEQSS A 432 

A+S P+LL++A L+ + +++A HE+ W VFP+T SSA 
Sbjct: 3 66 AWSTPFLLWRAGLHPVIjVYLLWAVHEWAWNVFPSTPASSA 405 

Schizosaccha romyces pombe 

Score = 176 bits (445) , Expect = 8e-43 

Identities = 132/390 (33%), Positives = 194/390 (49%), Gaps = 35/390 (8%) 

Query: 42 LWLADS IVT KAHC IGTVS YTDIDFSS YMQQI FKIRQGELDYSNT FGDTGPLVYPAGHVHAY 101 

L L + + II V YT+ID+ +YM+Q+ GE DY ++ G TGPLVYP GHV Y 

Sbjct: 30 LLLLEI PFVFAI I SKVP YTE IDWI AYMEQVNSFLLGERDYKS LVGCTGPLVYPGGHVFLY 89 

Query: 102 S VTjSWYSDGGEDVSFVQQAFGWLYLGCLIiLS I SS YFFSGLGKI P PVYFVLLVASKRLHS I 161 

++L + +DGG ++ Q F ++Y + +1 Y F + + P +VLL+ SKRLHSI 
Sbjct: 90 TLLYYLTDGGTNIVRAQYIFAFVYW- - ITTAIVGYLFK- rVRAPFYIYVLLILSKRLHSI 146 

Query: 162 FVLRLFNDCTJTTFLMLATIIILQQ^ 221 

F+LRLFND + L + 1+ W + A+ S+A SVKM+ 

Sbjct: 147 FILRLFNDGFNS-LFSSLFILSSCKKKWVR ASILLSVACSVKMSSLLYV 194 

Query: 222 XXXXXXXXXXCDENLIKAI^ 281 

IH-+ LP + + + +y + QAFDF 

Sbjct: 195 PAYLVL LLQ ILGPKKTWMHI FVT 1 1 VQI LFS I PF LAYFWS YWTQAFDF 242 

Query: 282 SRQFLYKWTVNWRFLSQETFNNVIIFHQI^FAIJtt^ 341 

R F YKWTVNWRF+ + F + F + LH+ LV F K + + p 
Sbjct: 243 GRAFD YKWTVNWRFI PRS I FESTS FS TS I LFLHVALLVAFTCKHWNKLSRATP 295 

Query: 342 I FKFWKPTLS PTNI INDPERSPDFVYTVMATTNLIGVLFARSLHYQFLSWYAFSLPYLLY 401 
P L+ + +P+F++T +AT+NLIG+L ARSLHYQF +W+A+ PYL Y 
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Sbjct: 296 - FAMVNSMLTLKPIiPKLQLATPNFI FTAIATSNLI GILCARSIiHYOFYAWFAWYS P YIiCY 354 

Query: 402 KARUKTF I AS I IVYAAHEY CWLVT PATEQS S 431 

+A I ++ EY W VFP+T+ SS 

Sbjct: 355 QASFPAPIVIGLWMI1QEYAWNVFPSTKI1SS 384 
Arahidopsis thai ± ana 

Score = 164 bits (415), Expect = 2e-39 , 0/ , Q i 
Identities = 131/391 (33%), Positives = 194/391 (49%), Gaps = 29/391 (7%) 

Ouerv* 42 LWIADS I VTKVT I GTVS YTDIDFSS YMQQ I FKIRQGELDYSNI FGDTGPL VYPAGHVHAY 101 

Ij LAD+I++ +11 V YT ID+ +YM Q+ GE DY N + GDTGP LVYPAG ++ Y 

Sbjct: 39 L I LAD AI LVAL 1 1 AYVP YTKIDWDAYMSQVS GFLGGERD YGNLKGDTG P LVYP AGFLYVY 98 

Ouerv- 102 SVIjSWYSDGGEDVSFVQQAFGV^YLGCLIjLSISSYTFSGIjGIQ PP VYFvliLVASKRIiHSI 161 

- S + + G +V Q FG LY+ L + + Y + + +P LL SKR+HSI 

Sbjct: 99 SAVQNLTGG--EVYPAQILFGVLYIVNI^IVLIIYVlCroV--VPV7WALS 154 

Query- 162 FVLRIiFNDCLTT FLMLAT III IiQQAS SWRKDGTT I PIaSVPDAADTYSLAI S VKMNXXXXX 221 

FVLRLFNDC L+ A++ + +RX + + +S A+SVKMN 

Sbjct: 155 FVLRIiFNDCFAMTIiIjHASMALFL YRKWHLGMLV FSGAVSVKMNVLLYA 202 

Ouerv- 222 XXXXXXXXXXCDENLIKA^ 281 

N+1 ++ F++ +Y AFD 

Sbjct: 203 PTLLI^LKAM--NIIGOTS^ SYIANAFDL 251 

Query: 282 SRQFIjYKWTVNWRFLSQETFNttVHFHQ 341 

R F++ W+VN++F+ + F + F h H+ LV F + K+ G +G 
Sbjct: 252 GRVFIHFWSVNFKFVPERVFVSKEFAVCL^ 310 

Ouerv 342 I FKFWKP - TLS PTNT IND PERS PDFVYTVMATTNIi I GVIiFARS LHY QFLS WYAFS LP YIiL 400 
U 1 * F P+LS+++ + + VTMN IG++FARSI1HYQF SWY +SLPYI1L 

Sbjct: 311 HFFLTIiPSSLSFSDVSASRIITKIHVVTAMFVGNFIGIWARS 370 

Query: 401 YKARLNFIASIIVYAAHEYCWIjVFPATEQSS 431 

++ +3;++ E CW V+P+T SS 

Sbjct: 371 WRTPFPTWLRLIMFLGIELCWNVYPSTPSSS 401 
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K. lactisALG3 

TTTGTTTACAAGCTGATACCAACGAACATGAATACACCGGCAGGTTTACT 

GAAGATTGGCAAAGCTAACCTTTTACATCCTTTTACCGATGCTGTATTCAG 

TGCGATGAGAGTAAACGCAGAACAAATTGCATACATTTTACTTGTTACCA 

ATTACATTGGAGTACTATTTGCTCGATCATTACACTACCAATTCCTATCTT 

GGTACCATTGGACGTTACCAGTACTATTGAATTGGGCCAATGTTCCGTATC 

CGCTATGTGTGCTATGGTACCTAACACATGAGTGGTGCTGGAACAGCTAT 

CCGCCAAACGCTACTGCATCCACACTGCTACACGCGTGTAACACATACTG 

TTATTGGCTGTATTCTTAAGAGGACCCGCAAACTCGAAAAGTGGTGATAA 

CGAAACAACACACGAGAAAGCTGAG 

K. lactis Alg3p 

FVYKLIPTNMNTPAGLIJOGKA]SILLH^ 

GVLFARSLHYQFI^WYH^TXPVLLNWANVPYPIGVLWYLTHEWCWNS^ 
NATASTLLHA.CNTY CYWLYSZEDPQTEJKVVnKQHTRKL 
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K. lactis ALG3 BLAST 



Sequences producing significant alignments: (bits) Value 



Si 



Si 



Si 



Hi 



Si 



586444|sp|P3B179lALG3 YEAST Dolichyl-P-Man :Man (5 ) GlcNAc ( . . .1|| 



984725|qb|AAA75352.ll ORF 1 _|± 

qb|AAL16193.llAF428424 1 At2g47760/F17A22 . 15 [A... 72 le-12 



16226531 



25367230 



21292031 



20892051 



pir||B84919 Not56- like protein [imported] -Ara..._72 le-12 



qblEAA04176.1 | agCP3388 [Anopheles gambiae str JS9 2e-ll 



ref IXP 148657. 1[ similar to Lethal ( 2 ) neighbour . . ._65 2e-10 



Alignments 



3. cerevisiae 



Score = 125 bits (314), Expect = le-28 

Identities- 60/120 (50%), Positives = 83/120 (69%), Gaps = 1/120 (0%) 
Frame = +3 

Query- 66 ANLLHPFT - DAVFS AMRVNAEQIAYILLVTNYIGVLFARSLHY^ 242 

++L HP +AV +A A+ I ++L+ +N+IGVLF+RSLHYQFLSWYHWTLP+L+ W+ 
Sbjct: 332 SSLCHPLRKNAVLNANP- -AKTIPFVLIASNFIGVLFSRSLHYQFLSWYHWTLPILIFWS 3 89 

Query: 243 NVP YPLCVLWYLTHEWCWNS YP PNATASTL * EDPQTRXWITKQHTR 422 

+p+ + +WY+ HEWCWNSYPPN+ ASTLL A NT L+ +V + KHR 

Sbjct: 390 GMPFFVGPIWYVIJ^C^SYPPNSQASTLLLAI^^ 448 



A. thaliana. 
Score =72.0 bits (175), Expect = le-12 

Identities = 42/107 (39%), Positives = 57/107 (53%), Gaps - 3/107 (2%) 
Frame = +3 

Query: 84 FTDAVFSAMRVNAEQIAYIIiLVTTTY^ 263 

F+D ' S + + E + + V N+IG++FARSLHYQF SWY ++LP LL PL 
Sbjct: 322 FSDVSASRI - 1 TKEHWTAMFVGNF I G I VFARS LHYQ FYS W YFYSLP YLLWRTP F PTWLR 380 

Query: 264 VLWYLTHEWCWNSYPPNATASTL LHACNTYCYWLYS*EDPQTRK 3 95 

++ + L E Cm YP ++S h LH WL DP K 

Sbjct: 381 LI MFLGI ELCWNVYPSTP S S S GLLLCLHL 1 1 LVGLWLAP SVD P YQLK 427 
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S cercvisiae ALG9 
ATGAATTGCAAGGCGGTAA^ 

ATATATTCAGCCGACATTCTCGTTAATTTCAGATTGCGATGAAACTTTTAATTATT 

GGGAACCATTAAATITATTGGTACGTGG 

ACCCGAGTATTCTATTAGATCATGGGCTTTCTrATTACCTT^ 

TCCAGTAAACAAATTTACTGACCTAGAAAGTCATTGGAACl 1 i 1 1 CATC ACAAGA 

GCATGCTTAGGCITITTTAGTTTTATCATGGAATTTAAACTACATCG 

AGGCAGCTTGGCATTGCAAATCGCAAATATTTGGATTATTTTCCAATTGTTTAA 

CGGGCTGGTTCCATGCATCTGTGGAATTATTGCCTTCTGCCGTTGCCATGTTGTTG 

TATGTAGGTGCCACCAGACACTCTCTACGCTATCTGTCCACTGGGTCrACTTCTAA 

CTTTACGAAAAGTTTAGCGTACAATITCCTGGCTAGTATACTAGGCTGGCCATTTG 

TTTTAATTTTAAGCTTGCCATTATC 

CTACCATCAGAACCGCATTCGACTGCTGTITGATATTTTCATTGACTCK:ATTrGCT 

GTGATTGTCACTGACAGTATATTTTACGGGAAGCTTGCTCCTGTATCATGGAACA 

TCITATTTTACAATGTCATTAATGCAAGTGAGGAATCTGGCCCAAATATTTTCGGG 

GTTGAGCCATGGTACTACTATCCACTAAATTTGTTACrGAATTTCCCACTGCCTGT 

GCTAGTTTTAGCTATTTTGGGAATTITCCATTTGAGATTATGGCCATTATGGGCAT 

PATTATTCACATGGATTGCCGTTTTCACTCAACAACCTCACAAAGAGGAAAGATT 

TCTCrATCCAATTTACGGGTTAATAACTTTGAGTGCAAGTATCGCCTm 

TGTTGAATGTATTCAATAGAAAGCCGATTCITAAAAAAGGTATAAAGTTGTCAGT 

TTTATTAATTGTTGCAGGCCAGGCAATGTCACGGATAGTGGCTTTGGTGAACAAT 

TACACAGCTCCTATAGCCGTCTACGAGCAATTTTCTTCACTAAATCAAGGTGGTG 

TGAAGGCACCGGTAGTGAATGTATGTACGGGACGTGAATGGTATCACTTCCCAAG 

TTCTTTCCTGCTGCCAGATAATCATAGGCTAAAAT TTGTT AAATCTGGATTTGATG 

GTCTTCTTCCAGGTGATTTTCCAGAGAGTGGTTCTATTTTCAAAAAGATTAGAACT 

TTACCTAAGGGAATGAATAACAAGAATATATATGATACCGGTAAAGAGTGGCCG 

ATCACTAGATGTGATTATTTTATTGACATCGTCGCCCCAATAAATTTAACAAAAG 

ACGTTTTCAACCCTCTACATCTGATGGATAACTGGAATAAGCTGGCATGTGCTGC 

ATTCATCGACCKjTGAAAATTCTAAGATTTTGGGTAGAC^ATTTTACGTACCGGAG 

CCAATCAACCGAATCATGCAAATAGTTTTACCAAAACAATGGAATCAAGTGTACG 

GTGTTCGTTACATTGATTACTGTTTGTTTGAAAAACCAACTGAGACTACTAATTGA 



S. cerevisiae Alg9p 

MNCKAVTISLLLLLFLTRVYIQPTFSLISDOTETTOTWEPLNLLVRGFGKQTWEYSPE 
YSIRSWAFLLPFYCILYPVNKFTDLESHWN^ 

IANIWIIFQLFM'GWFHASVELLPSAVAMLLYVGATRHSLRYLSTGSTSNFT&SLAYN 

FXASILGWPFVLII^LPLCLFrYLFNHMSmTAFT)CCLIFSLTAFAVW 

VSWNILFYNVIKASEESGPNIFGVEPWYY^ 

ASLFTWIAVFTQQPHKEERFLYPIYGLITLSASIAF^V^ 

VAGOAMSRWALVNNYTAPIAVYEQFSSIJ^QGGVKAPVVNVCT 

DNHRLKFVKSGFDGLLPGDFPESGSIFKKIRTLPKGMNNKl^YDTC 

DIV APINLTKDVFNPIJEELMDNWNKIACAAFIIXj 

KQWNQVYGVRYTOYCIJFEKPTETTN 
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tSototgtctgctcgatacttccttttacagtaaccaacatacatgtt 

CTCCAACATGCTCTTGTATGTATTGGCCTATTCTATCTTGAGACTTGATATC 

AACCTTCTATGGTATTATTTCAGACTGTGATGAAGTGTTCAACTACTGGGA 

GCCACTCAACTTCATGCTTAGAGGGTTTGGAAAACAGACTTGGGAGTATT 

CTCCAGAGTATGCCATCCGATCTTGGTCCTATCTAGTGCCACTTTGGATAG 

CAGGCTATCCACCATTGTTCCTGGATATCCCTTCTTACTACTTTTTCTACTT 

TTTCAGACTACTGCTGGTTATTTTTTCATTGGTTGCAGAAGTCAAGTTGTA 

CCATAGTTTGAAGAAAAATGTCAGCAGTAAGATCAGTTTCTGGTACCTTCT 

ATTTACAACCGTTGCTCCAGGAATGTCTCATAGCACGATAGCCTTATTACC 

ATCCTCITITGCTATGGTTrGTCACACTTTTGCCATTAGATACGTCATTGAT 

TACCTACAATTACCAACATTAATGCGCACAATCAGAGAGACTGCTGCCAT 

CTCACCAGCTCACAAACAACAACTAGCCAACTCTCTC 

P. pastoris Alg9p 

WPSCLLDTSFYSNQHTCSPTCSCMYWPII^ZDLISTFYGnSDCDBVFNYWEPL 
>n r MLRGFGKQTWEYSPEYAIRSWSYLWLWIAGYPPLFLDIPSYYFFYITRLLL 
VIFSLV AEVKLYHSLKKKVSSKISFWYLIJEnTVAPGMS CH 
TFAmYVIDYLQLPTLMRTIRETAAISPAHKQQLAlSfSL 
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P. pastoris ALG9 BLAST 



Score E 

Sequences producing significant alignments: (bits) Value 



3i 



at 



Si 



ai 



ai 



ai 



Si 



5324110 jreflNP 014180. l| catalyzes the transfer of maiino. . .131 le-29 
• ~~ " agCP7810 [Anopheles gambiae str. . . .110 2e-23 



2X296668 ] qb | EAA08813 . 1 1 *- , — - ^ 

7019765 I emb|CAB75773.l| putative mannosyl trans f erase mv. . .104 le-21 



26341066 



16551378 



19527202 



12053349 



dbi lBAC34195.ll unnamed protein product [Mus mu. . ._99 4e-20 

— , ,__ • i on /Q.on 



qblAAIi25798.ll DIBD1 [Homo sapiens] _9£ 4e-20 



reflNP 598742. l| RIKEN cDNA B230402H15 [Mus mus..._99 4e-20 



rex fviir 3?o/^ii.x| *vo-*vwm « — «*~--— — — 

embtCAB66861.l| hypothetical protein [Homo sapi..._99 4e-20 



Alignments 



5. cerevisiae 



Score * 131 bits (329), Expect * le-29 

Identities = 62/141 (43%), Positives = 91/141 (64%), Gaps = 1/141 (0%) 
Frame = +2 

Query: 200 ISTFYGI ISDCDEVTlTyWEPLNFM^ - PLF 376 

I + +ISDCDE FNYWEPLN ++RGFGKQTWEYSPEY+IRSW++L+P + YP F 
Sbjct: 21 IQPTFSLISDCDETFNYWEPLNLLWGro^ 80 

Query: 377 IJDIPSXXXXXXXRLLLVTFSL^^ 556 

D+ S R L FS + E KL+ + +++ +1+ +++F PG H+++ h 

Sbjct: 81 TDIiESHWOTFITRACLGFFSFIMEFKLHREIAGSLALQI 14 0 

Query: 557 LPS S FAMVCHT FAIRYVIDYL 619 

LPS+ AM+ + A R+ + 
Sbjct: 141 L P S AVAMLL YVGATRHS LRYL 161 

Anopheles gambiae 
Score = 110 bits (274), Expect = 2e-23 

Identities = 58/130 (44%) , Positives = 79/130 (60%) , Gaps = 3/130 (2%) 
Frame - +2 

Query: 197 LISTFYGIISDCDEVFNYWEPLNF^ 376 

L S Y IISDCDE +NYWEP1H-++L+G G QTWEYSPE+A+RS+SY LW+ G P 
Sbjct: 34 I^SALYSI ISDCDETYNYWEPLHYIiLKGKGFQTWEYSPEFALRSYS Y- - - LWLHGLPAKV 90 

Query: 377 LDIPS XXXXXXXRLLLVIFSLv7VEVTOjYH^ 547 

L + + R LL + + E +LY I* + ++ +LLF + GM S+ 

Sbjct: 91 LQLMTDNGVIjI FYFVRCLLAVTCALIiE YRL YRI LGRKCGGGVAS LWLLFQLTS AGMF I S S 150 

Query: 548 IALLPSSFAM 577 

ALLPSSF+M 
Sbjct: 151 AALLPSSFSM 160 
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S. pombe 

Score = 104 bits (260) , Expect = le-21 

Identities = 58/157 (36%), Positives = 85/157 (54%) 

Frame = +2 

Query 197 LISTFYGIISDODEVFNYVrePLNFMLRGFG 376 

L S + +1 DCDEV+NYWEPL++4-L G+G QTWEYSPEYAIRSW Y+ + G+ 
Sbjct: 26 LTSAS FRVIDDCDEVYNYWEPLHYLLYGYGLQTWEYS PE YAIRSWFYI ALHAVPGFLARG 85 

Query: 377 LD I P SXXXXXXXRLLLVI F S LVAEV KL YHS LKXNVS S KI S FWYLLFTTVAPGMSHST I AL 556 

L + R +L FS E L ++ +N + ++ V GM ++ + 

Sbjct: 86 LGLSRIjHVFYFIRGVLACFSAFCETKLILAVAR 145 

Query: 557 LPSSFAMVCHTFAIRYVIDYI^LPTLMRTIRETAAIS 667 

LPSSFAM T A+ L P+ RT++ + 1+ 

Sbjct: 146 LPSSFAMNMVTLALS AQLSPPSTKRTVKWSFIT 179 



M. musculus 
Score «= 99.4 bits (246), Expect = 4e-20 

Identities - 57/143 (39%), Positives - 76/143 (53%), Gaps = 1/143 (0%) 
Frame = +2 

Query 152 SPTCSCMYWPI3^*DLISTFYGIISDCDEVFNYWEPIOT 331 

+P S + +LS L + ISDCDE FNYWEP ++++ G G QTWEYSP YAIRS+ 

Sbjct: 55 APEGSTAFKCLIiSARIjCAAIiIiSOT 114 

Query: 332 SY-LV1>LWIAGYPPLFLDIPSXXXXXXXRLI^^ 508 

+Y L+ W A + L R LL S V E+ Y ++ K +S L 

Sbjct: 115 AYLLLHAWPAAFHARILQTNKILVFYFLRCuLAFVS CVCELYFYKAVCKKFGLHVSRMML 174 

Query: 509 LFTTVAPGMSHSTIALIjPSSFAM 577 

F ++ GM S+ A LPSSF M 
Sbjct: 175 AFLVLSTGMFCSSSAFLPSSFCM 197 



H. sapiens 
Score = 99.4 bits (246), Expect = 4e-20 

Identities = 56/143 (39%), Positives = 76/143 (53%), Gaps - 1/143 (0%) 
Frame « +2 

Query: 152 SPTCSCMYWPILS*DLISTFYGIISDCDEVTNYWEPLNFM^ 331 

+p s + +LS L + ISDCDE FNYWEP ++++ G G QTWEYSP YAIRS+ 

Sbjct: 55 APEGSTAFKOjI^ARLCAALLSNISDCDETFNYWEP 114 

Query. 332 SY ^* h ^ ^ + ^ R LL S + E+ Y ++ K +S L 

Sbjct: 115 AYLLLHAWPAAFHARI LQTNKILVFYFLRCLLAFVS CI CEL YFYKAVCKKFGLHVSRMML 174 

Query: 509 LFTTVAPGMSHSTI ALLPSS FAM 577 

F ++ GM S+ A LPSSF M 
Sbjct: 175 AFLVLSTGMFCSSSAFLPSSFCM 197 
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FIGURE 13 



S cerevisiae ALG1 2 

ATGCGTTGGTCTGTCCTTGATACAGTGCTATTGACCGTGATTTCCTTTCATCTAAT 

CCAAGCTCCATTCACCAAGGTGGAAGAGAGTTTTAATATTCAAGCCATTCATGAT 

ATTITAACCTACAGCGTATTTGATATCTCCCAATATGACCACTTGAAATTTCCTGG 

AGTAGTCCCTAGAACATTCGTTGGTGCTGTGATTATTGCAATGCTTTCGAGACCTT 

ATCTTTACTTGAGTTCTITGATCCAAACTrCCAGGCCTACGTCTATAGATGTTCAA 

TTGGTCGTTAGGGGGATrGTTGGCCTCACCAATGGGCTTTCTI 1 1 ATCTATTTAAA 

GAATTGTTTGCAAGATATGTTTGATGAAATCACTGAAAAGAAAAAGGAAGAAAA 

TGAAGACAAGGATATATACATTrACGATAGCGCTGGTACATGGTTTCTmATTTT 

TAATTGGCAGTITCCACCTCATGTTCTACAGCACTAGGACTCTGCCrAATTTTGTC 

ATGACTCTGCCTCTAACCAACGTCGCATTGGGGTGGGTTTrATTGGGTCGTTATAA 

TGCAGCTATATTCCTATCTGCGCTCGTGGCAATTGTATTTAGACTGGAAGTGTCAG 

CTCTCAGTGCTGGTATrGCTCTATITAGCGTCATCITCAAGAAGATTTCTITATTC 

GATGCTATCAAATTCGGTATCTrTGGCITGGGACTTGGTTCCGCCATCAGTATCAC 

CGTTGATTCATATTTCTGGCAAGAATGGTGTCTACCTGAGGTAGATGGTTTCTTGT 

TCAACGTGGTTGCGGGTTACGCTTCCAAGTGGGGTGTGGAGCCAGTTACTGCTTA 

TTTCACGCATTACTTGAGAATGATGTTTATGCCACCAACTGTTTTACTATrGAATT 

ACTrCGGCTATAAATTAGCACCTGCAAAATTAAAAATTGTCTCACTAGCATCTCTT 

TTCCACATTATCGTCTTATCCT1TCAACCTCACAAAGAATGGAGATTCATCATCTA 

CGCTGTTCCATCTATCATGTTGCTAGGTGCCACAGGAGCAGCACATCTATGGGAG 

AATATGAAAGTAAAAAAGATTACCAATGTTITATGITrGGCTATATrGCCCTTATC 

TATAATGACCTCCITITTCATTTCAATGGCGTTCTrGTATATATCAAGAATGAATT 

ATCCAGGCGGCGAGGCTTTAACrriUl'r 1 1 AATGACATGATTGTGGAAAAAAATAT 

TACAAACGCTACAGTTCATATCAGCATACCTCCTTGCATGACAGGTGTCACTTTAT 

ITGGTGAATTGAACTACGGTGTGTACGGCATCAATTACGATAAGACTGAAAATAC 

GACTTTACTGCAGGAAATGTGGCCCTCCITrGATTTCTTGATCACCCACGAGCCA 

ACCGCCTCTCAATTGCCATTCGAGAATAAGACTACCAACCATTGGGAGCT AGTTA 

ACACAACAAAGATGTTTACTGGATTTGACCCAACCTACATTAAGAACTTTGTTTT 

CCAAGAGAGAGTGAATGTTTTGTCTCTACTCAAACAGATCATTTTCGACAAGACC 

CCTACCGTITTTTTGAAAGAATTGACGGCCAATTCGATTGTTAAAAGCGATGTCTT 

CTTCACCTATAAGAGAATCAAACAAGATGAAAAAACTGATTGA 



S. cerevisiae Algl2p 



MRWSVLDTVLLTVISFHLIQAPFTKV^^ 

RTWGAVHAMI^I^YLYLSSLIQTSRPTSroVQLVVRGIVGLTKGLSFrYLKNCLQDM 

FDEITEKKKEENEDKDIYTYDSAGTWFLIJFLIGSFHIMFYSm^ 

GWVLLGRYNAAIFLSALVAIVFFJLEVSALSAGIA^ 

AISITVDSYFWQEWCIJEVDGFLFNW^ 

LLLNYFGYKLAPAKIJOVSLASLFHIIV^ 

HSMKVKKTOm.CIAILPLSMTSFFK 

ATVfflSIPPC]VrrGVTLFGELhr^GWGINYDKTENTTLLQEMWP^ 

FF>KTTKEIWELVNTTKMFrGFDPTYIO^FWQF^ 

ANSrVKSDVFFTYKFJKQDEKTD 
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FIGURE 14 

P. pastoris ALG12 

TCGGTCGAGAATGATAACTGAAGAACTCAAAATCTCTCACACTTTCATCGT 

TACTGTACTGGCAATCATTGCATTTCAGCCTCATAAAGAATGGAGATTTAT 

AGTTTACATTGTTCCACCACTTGTCATCACCATATCTACAGTACTTGCACA 

ACTACCCAGGAGATTCACAATCGTCAAAGTTGCTGTTTTTCTCCTAAGTTT 

CGGCTCTTTGCTCATATCCCTGTCGTTTCTTTTCATCTCATCGTATAACTAC 

CCTGGGGGTGAAGCTTTACAGCATTTGAACGAGAAACTCCTTCTACTGGA 

CCAAAGTTCCCTACCTGTTGATATTAAGGTTCATATGGATGTCCCTGCATG 

CATGACTGGGGTGACTTTATTTGGTTACTTGGATAACTCAAAATTGAACAA 

TTTAAGAATTGTCTATGATAAAACAGAAGACGAGTCGCTGGACACAATCT 

GGGATTCTTTCAATTATGTCATCTCCGAAATTGACTTGGATTCTTCGACTG 

CTCCCAAATGGGAGGGGGATTGGCTGAAGATTGATGTTGTCCAAGGCTAC 

AACGGCATCAATAAACAATCTATCAAAAATACAATTTTCAATTATGGAAT 

ACTTAAACGGATGATAAGAGACGCAACCAAACTTGATGTTGGATTTATTC 

GTACGGTCTTTCGATCCTTCATAAAATTTGATGATAAATTATTCATTTATG 

AGAGGAGCAGTCAAACCTGAAAATATATACCTCATTTGTTCAATTTGGTGT 

AAAGAGTGTGGCGGATAGACTTCTTGTAAATCAGGAAAGCTACAATTCCA 

ATTGCTGCAAAAAATACCAATGCCCATAA 

P. pastoris Algl2p 

RMTTEELKISHTFrVTVIAIIAFQPHKEWRFIV^ 

KVAWLI^FGSLLISI^FLFISSYNYPGGEALQHLNEKLLLIJ)QSSLPVDIKVH 
MDWACMTGVTIJGYl^NSKLNNLPJVYDKTEDESLDTIWDSFNYVB 
SSTAPKWEGDWIJGDWQGYNGINKQSIKNT^^ 
RTWPvSFIKFDDKLFTYERS S Q 



20/46 



WO 03/056914 



PCT7US02/41510 



FIGURE 15 (sheet 1) 



P. pastoris ALG12 BLAST 



Sequences producing significant alignments: 



SI 

Hi 
si 
si 
Hi 
Hi 



13 02525 |emblCAA96310.1 



19112221 



15B64569 



13129114 



22266724 



18478284 



ref 



emb 



ref 



ORF YNR03 0W [Saceharomyces cerev. 
NP 595429.1 | putative involvement in cell w. 
CA.C63681.ll putative dolichyl-p-man: Man7Gl. 
NP 077010 .1 1 dolichyl-p-raannose:Man7GlcNAc2. 



gblAAM9490oTl|AF311904 1 membrane protein SB87 
emb|CAD22101.l]_ putative mannosyl transferase [M 



Score E 
(bits) Value 



. 102 
. 56 
. 53 
. 53 
. S3 
. 52 



5e-21 
5e-07 
4e-06 
4e-06 
4e-06 
8e-06 



Alignments 
S. cerevisiae 
Score = 102 bits (255), Expect = 5e-21 

Identities - 74/258 (28%) , Positives = 121/25B (46%) , Gaps = 19/258 (7%) 

£MI TEELKI SHTFIVTVDAI I AFQPHKEWRFIVYIVP PLVI TI S TVLiAQLPRRFTIVKVA 187 
++ +LKI + + +++FQPHKEWRFI+Y VP +++ +T A L + K+ 

KLAPAKLKI VS LAS L FHI IVLS FQPHKEWRF 1 1 YAVPS IMLLGATGAAHLWENMKVKKI T 361 



Query: 


8 


Sbjct : 


3 02 


Query: 


188 


Sbjct: 


362 


Query: 


347 


Sbjct: 


418 


Query: 


506 


Sbjct: 


475 


Query: 


671 


Sbjct: 


529 



+ NYPGGEAL N+ ++ + VH+ 

NVLCIAILPLSIMTSFFISMAFLYISR24NYPGGEALTSFNDMIV EKKITNATVHIS 417 

WAC^GVTIiFGYIiDNSKIJWIiRIVYDKTEDES - LDTTWDSFNYVI SEIDLDSS 505 

+P CMTGVTLFG L+ I YDKTE+ + L +W SF+++I S++ ++ 

IPPCMTGVTLFGELNYGVYG INYDKTENTTLLQEMWPSFDFLITHEPTASQLPFENK 474 

TAPKWEGDWLKIDWQGYNGINKQSIKNTI FN YGILKRMIRDATKLDVGFIRTVF 670 

T WE ++ + + G + IKN +F +LK++I D K F++ + 

TTNHWE LVNTTKMFTGFDPTYIKNFVFQERVNVLSLLKQIIFD- - KTPTVFLKEIiT 528 

RSFIKFDDKLFIYERSSQ 724 

+ 1 D F Y+R Q 
ANSIVKSDVFFTYKRIKQ 546 



S . pombe 

Score = 56.2 bits (134), Expect = 5e-07 

Identities = 46/152 (30%), Positives = 62/152 (40%), Gaps = 11/152 (7%) 

Query: 65 1 1AFQPHKEWRFI VYI VPPLVITI STVLAQL PRRFTIVKVAVXXXXXXXXXX 220 

+ +F HKEWRFI+Y + P S+AL +F 

Sbjct: 295 VYS FLGHKEWRF I I YS I - PWFNAAS AI GAS LCFNAS KFGKKI FEI LRLMFFS GI I FGF I G 353 

Query: 221 XXXXXXXXXYNYPGGEAIiQHLNEKLLLLDQS SLPVDI KVHMDVPACMTGVTLFG YIjDNSK 400 

Y YPGG AIi L E + VHMDV CMTG+T F I> + 

Sbjct: 354 SSFIiLYVFQYAYPGGLALTRLYE IENHPQVSVHMDVYPCMTGITRFSQLPS - - 404 



WO 03/056914 



PCT7US02/41510 



FIGURE 15 (sheet 2) 

Query: 401 LNNLRIVYDKTEDESL DTTWDSFNYVTSE 487 

YDKTED + F+Y+I+E 
Sbjct: 405 WYYDKTEDPKMLSNSLFISQFDYLITE 431 



Homo sapiens 
Score = 53.1 bits (126), Expect = 4e-06 

Identities = 41/149 (27%) , Positives - 68/149 (45%) , Gaps = 6/149 (4%) 

Query: 59 I^IAFQPHKEWRFIVYIVPPLVITISTVIAQLPRR FTIVKVAVXXXXXXXXXX 220 

+A+ + PHKE RFI+Y PLIT++L + +V 

Sbjct: 299 MALYSLLPHKELRFI I YAFPNUiNT T AARGCS YLIiNNYKKS WL YKAGS LLVT GHLVVNAAY 358 

Query: 221 XXXXXXXXXYim>GGEALQHLNEKLLLI^ 400 

+NYPGG A+Q L++ L+ Q+ D+ +H+DV A TGV+ F ++++ 
Sbjct: 359 SATALYVSHFNYPGGVAMQRIiHQ- -LVPPQT DVLLHIDVAAAQTGVSRFLQVNSAW 412 

Query: 401 LNNLRIVYDKTEDESLDTrWDSFNYVISE 487 

YDK ED T ++ +++ E 
Sbjct: 413 R YDKREDVQPGTGMLAYTHI LME 435 
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FIGURE 25 

S. cerevisiae ALG6 „.__, 
ATGGCCATTGGCAAAAGGTTACTGGTGAACAAACCAGCAGAAGAATCATT 

TTATGCTTCTCCAATGTATGATTTTTTGTATCCGTTTAGGCCAGTGGGGAA 

CCAATGGCTGCCAGAATATATTATCnTGTATGTGCTGTAATACTGAGGTG 

CACAATTGGACTTGGTCCATATTCTGGGAAAGGCAGTCCACCGCTGTACG 

GCGATTTTGAGGCTCAGAGACATTGGATGGAAATTACGCAACATTTACCG 

CTTTCTAAGTGGTACTGGTATGATTTGCAATACTGGGGATTGGACTATCCA 

CCATTAACAGCATTTCATTCGTACCTTCTGGGCCTAATTGGATCTTTTTTCA 

ATCCATCTTGGTTTGCACTAGAAAAGTCACGTGGCTTTGAATCCCCCGATA 

ATGGCCTGAAAACATATATGCGTTCTACTGTCATCATTAGCGACATATTGT 

TTTACTTTCCTGCAGTAATATACTTTACTAAGTGGCTTGGTAGATATCGAA 

ACCAGTCGCCCATAGGACAATCTATTGCGGCATCAGCGATTTTGTTCCAAC 

CTTCATTAATGCTCATTGACCATGGGCACTTTCAATATAATTCAGTCATGC 

TTGGCCTTACTGCTTATGCCATAAATAACTTATTAGATGAGTATTATGCTA 

TGGCGGCCGTTTGTTTTGTCCTATCCATTTGTTTTAAACAAATGGCATTGTA 

TTATGCACCGATTTTTTTTGCTTATCTATTAAGTCGATCATTGCTGTTCCCC 

AAATTTAACATAGCTAGATTGACGGTTATTGCGTTTGCAACACTCGCAACT 

TTTGCTATAATATTTGCGCCATTATATTTCTTGGGAGGAGGATTAAAGAAT 

ATTCACCAATGTATTCACAGGATATTCCCTTTTGCCAGGGGCATCTTCGAA 

GACAAGGTTGCTAACTTCTGGTGCGTTACGAACGTGTTTGTAAAATACAA 

GGAAAGATTCACTATACAACAACTCCAGCTATATTCATTGATTGCCACCGT 

GATTGGTTTCTTACCAGCCATGATAATGACATTACTTCATCCCAAAAAGCA 

TCTTCTCCCATACGTGTTAATCGCATGTTCGATGTCCTITITrCTTTTTAGC 

TTTCAAGTACATGAGAAAACTATCCTCATCCCACTTTTGCCTATTACACTA 

CTCTACTCCTCTACTGATTGGAATGTTCTATCTCTTGTAAGTTGGATAAAC 

AATGTGGCTTTGTTTACGCTATGGCCTTTGTTGAAAAAGGACGGTCTTCAT 

TTACAGTATGCCGTATCTTTCTTACTAAGCAATTGGCTGATTGGAAATTTC 

AGTTTTATTACACCAAGGTTCTTGCCAAAATCTTTAACTCCTGGCCCTTCT 

ATCAGCAGCATCAATAGCGACTATAGAAGAAGAAGCTTACTGCCATATAA 

TGTGGTTTGGAAAAGTTTTATCATAGGAACGTATATTGCTATGGGCTTTTA 

TCATTTCTTAGATCAATTTGTAGCACCTCCATCGAAATATCCAGACTTGTG 

GGTGTTGTTGAACTGTGCTGTTGGGTTCATTTGCTTTAGCATATTTTGGCTA 

TGGTCTTATTACAAGATATTCACTTCCGGTAGCAAATCCATGAAGGACTTG 

TAG 

S. cerevisiae ALG6p 

MAIGKI^LVNKPAEESFYASPMYDFLYPFRPVGNQWIPEYirFVCAVILRCTIG 

LGPYSGKGSPPLYGDFEAQRHwMErTQHIJl^KwTWYDLQYWGLDYPPLTA 

FHSYLLGLIGSFFNPSWFALEKSRGFESPDNGLKTYMR5TVIK 

FTKWUJRYRNQSPIGQSIAASAILFQPSmLmHGHFQYNSVMLGLTAYAIN^ 

LLDEYYAMAAVCFVLS ICFKQMALYY APIFFAYLI^RSLIJPKFMAPXTVIAF 

ATLATFAIIFAPLYFLGGGIJChnHQClHRIFPFARGIF^ DKVAM^ CVTNVFVK 

YKERFTIQQLQLYSLIATVIGFl^AMIMTLI^KKHLLPYVLIACSMSFFLFSFQ 

VHEKTILIPLLPITLLYS STD WNVI^LVS WlhMVALFTLWPLLKKD GLHLQYA 

VSFLI^NWLIGNFSFITPRFLJPKSLTPGPSISSINSDYRI^ 

YIAMGFYHFII>QFVAPPSKYPDLW^LNCAVGFICFS]rWLWSYYKIFTSGSK 

SMKDL 
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FIGURE 26 



ATGCCACATAAAAGAACGCCCTCTAGCAGTCTGCTGTATGCAAGAATTCC 

AGGGATCTCTTTTGAAAACTCTCCGGTGTTTGATTTTTTGTCTCCTTTTGGA 

CCCGCTCCTAATCAATGGGTAGCACGATACATCATCATCATCTTTGCAATT 

CTCATCAGATTGGCAGTTGGGCTGGGCTCCTATTCCGGCTTCAACACCCCT 

CCAATGTATGGGGATTTTGAA.GCTCAGAGGCATTGGATGGAAATTACTCA 

GCATTTATCCATAGAAAAATGGTACTTCTACGACTTGCAATATTGGGGGCT 

TGACTATCCTCCCTTGACAGCCTTTCATTCATACTTCTTTGGCAAATTAGGC 

AGCTTCATCAATCCAGCATGGTTTGCTTTAGACGTCTCCAGAGGGTTTGAA 

TCAGTGGATCTAAAATCGTACATGAGGGCGACCGCAATTCTCAGTGAGCT 

GTTATGTTTTATTCCAGCTGTCATTTGGTATTGTCGTTGGATGGGACTTAAC 

TACTTCAATCAAAACGCCATTGAGCAAACTATAATAGCGTCTGCTATTCTT 

1TCAATCCATCTTTAATTATCATAGATCATGGCCACTTCCAGTACAACTCA 

GTTATGCTAGGTTTTGCTTTATTATCCATATTAAATCTGTTGTACGATAATT 

TTGCATTAGCGGCTATTTTTTTCGTTCTTTCAATAAGCTTTAAGCAAATGGC 

TCTCTATTATAGCCCCATCATGTTTTTTTACATGCTGAGTGTGAGTTGTTGG 

CCTTTGAAAAACTTCAACTTGTTGAGATTGGCTACTATCAGTATTGCAGTA 

CTCTTGACTTTTGCAACTCTATTACTGCCTTTTGTATTAGTAGATGGGATGT 

CACAAA1TGGCCAAATATTATTCAGAGTTTTCCCGTTTTCAAGAGGCTTGT 

TTGAGGATAAGGTGGCCAACTTTTGGTGTACAACGAATATACTGGTAAAG 

TACAAACAGTTATTCACTGACAAAACCCTTACTAGGATATCGCTAGTAGC 

AACTTTGATTGCAATTAGTCCGTCTTGCTTCATCATTJTTACTCACCCAAAG 

AAGGTTTTACTACCGTGGGCTTTTGCTGCTTGCTCTTGGGCGTTCTATCTTT 

TCTCTTTCCAAGTCCACGAGAAATCAGTTTTAGTTCCATTGATGCCTACCA 

CTCTATTACTGGTAGAAAAAGACTTGGACATCATCTCAATGGTCTGCTGGA 

TTTCTAATATTGCCTTCTTCAGCATGTGGCCTCTATTAAAAAGAGACGGGC 

TGGCTTTGGAATATTTTGTCTTGGGAATATTGAGTAATTGGCTGATTGGAA 

ACCTCAATTGGATTAGTAAATGGCTTGTCCCCAGTTTCCTGATTCCAGGGC 

CTACTCTCTCCAAAAAAGTTCCTAAAAGAGATACTAAAACAGTTGTTCAT 

ACTCACTGGTTTTGGGGGTCAGTAACATTCGTTTCATACCTCGGAGCTACA 

GTTATCCAGTTCGTAGATTGGCTGTACCTTCCACCTGCCAAGTATCCAGAT 

TrGTGGGTTATTTTGAACACTACATTGTCGTTTGCTTGTTTCGGGTTGTTTT 

GGCTATGGATTAACTACAATCTGTACATTTTGCGTGATTTTAAGCTTAAAG 

ATGCTTAG 

P. pastoris Alg6 

MPHKRTPSSSLLYAim'GISFENSPVroFI^PFGPAPNQWARYTrilFAILmLAV 

GLGSYSGFKTPPMYGDFEAQRHWMEITQHI^IEKWYFYDLQYWGLDYPPLT 

AFBSYFFGKLGSFINPAWFALDVSRGFESVDLKSYMRATAII^ELLCTIPAVIW 

YCRWMGLNYFNQNAIEQTnASAILFWSLnroHGHFQYNSVMLGFALLSILNL 

LYDNFALAAIFFVL^ISFKQMALYYSPIMFFYMI^VSCWPLKbn^LRLATIS^ 

AVLLTFATLLLPFVLVDGMSQIGQILFRVFPFSRGIi^DKVANFWCTTNBLVK 

YKQLFTOKTLTRISLVATLIAISPSCFWIHPKKVLIPWAFAACSWAFYLFSFQ 

VHEKSVLWIMPTTLLLVEKDLDHSMV^^ 

VLGn^NWLIGNim^KWLWSFLIPGPTI^KKW 

VTFVSYLGATVIQFVDWLYIJPAKYPDLWmNTTI^FACFGLFWLWI>m^ 
YILRDFKLKDA 
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P. pastoris ALG6 BLAST 



Score E 

Sequences producing significant alignments: 



Si. 

ai 
si 
ai 
ai 
at 
ai 
ai 
ai 
at 
at 
at 



1420090 | emb I CAA99190.il 



74905B4 Ipirl 



19921070 



15240920 



ref 



ref 



70193251 ref 



12002040lqb 



T40396 



1176671 lsplQ09226 I ALG6 CAEBL 



21302638|qb 



544l7SB|emb 



EAA147B3.1 



agCP4617 [Anopheles garabiae str 
probable glucosyl transferase [Sc 
13129070 1 ref IKP 0769B4.ll hypothetical protein MGC2B40 s 



CAB46771.1 



(bits) Value 



ORF YOR002w [Saccharomyces cerev. . 
glucosyl transferase - fission yeast . . 
NP 609393. ll CG5091-PA [Drosophila melanoga. . 
NP 198662. l| glucosyl transferase- like prote.. 
NP 037471. 1[ dolichyl-P-Glc:Man9GlcNAc2-PP-d. . 
AA G43l63.l|AF063604 1 brain my046 protein [H. . 

Probable dolichyl pyrophosp. . 



____ "glucosyl transferase [Homo sapiens] 
20835439 IreflXP 131506. ll similar to Dolichyl pyrophosph.. 



2996578 |emb I CAA12176.ll 



489 
369 
47 
244 
. 238 
. 236 
, 222 
■ 219 
, 192 
. 112 
112 
.104 



e-137 
e-101 
4e-64 
3e-63 
2e-61 
7e-61 
9e-57 
Be-56 
le-47 
le-23 
le-23 
3e-21 



Alignments 
S. cerevisiae 

Score = 489 bits (1259), Expect = e-137 ■ 

Identities = 274/530 (51%), Positives « 358/530 (67%), Gaps = 5/530 (0%) 

Query 20 SFENSPVTOFLSPFGPAFNQWVXXX^^ 79 

SF SP++DFL PF P NQW+ +GLG YSG +PP+YGDFEAQRH 

Sbjct: 16 SFYASPMYDFLYPFRPVGNQWLPEYIIFVCAVILRCTIGLGPYSGKGSPPLYGDFEAQRH 75 

Query- 80 Wl^ITQHI^ IEKWYFYDLQYWGLDYPPLTAFHS YFFGKLGS FINPAWFALDVSRGFESVD 139 

WMEITQHL + KWY+ YDLQ YWGLD YP PLTAFHS Y G +GSF NP+WFAL+ SRGFES D 
Sbjct: 76 W>EI TQHLPIiSKWYWYDIjQYWGIjDYPPLTAFHS YLLGLI GS FFNPSWFALEKSRGFESPD 135 

Query 140 - - LKSYMRATAILSELLCFI PAVT WYCRWMGLNYFNQNAIEQTI IASAILFNPSLI I IDH 197 

LK+YMR+T I+S++L + PAVT++ +W+G Y NQ+ I Q+I ASAILF PSL++IDH 
Sbjct: 136 NGLKTYMRSTVIISDILFYFPAVlYFTKWIiG-RYRNQSPIGQSIAASA^ 194 

Query: 198 GHFQYNSvMLGFALLSILNLLYDNFAIiAAI FFVIiS I S FKQMALYYS PIMFFYMLS VSCWP 257 

GHFQYNSVMLG +1 NLL + +A+AA+ FVLSI FKQMALYY+PI F Y+LS S 
Sbjct: 195 GHFQ YNSVMLGLTAYAJNNLLDEYTAMAAVCFVXiS I CFKQMALYYAP I FFAYIiLiSRS LL - 253 

Query: 258 LKNFNLLRIiAT I S I AVLLTFATLLLP - FVLVDGMSQ I GQ I LFRVFPFSRGLFEDKVANFW 316 

FN+ RL 1+ A L TFA + P + L G+ IQ + R+FPF+RG+FEDKVANFW 
Sbjct: 254 FPKFNI ARLTVT AFATLATFAI I FAPLYFLGGGLKNIHQCIHRI FPFARGI FEDKVANFW 313 

Query: 317 CTTNILVKYKQLFTDKTLTRI SLVATLIAI S PS CFI I FTHPKKVLLPWAFAACSWAFYLF 376 

C TO+ VKYK+ FT + L SL+AT+I P+ + HPKK LLP+ ACS +F+LF 
Sbjct: 314 CVTNVFVKYKERFTI QQLQLYSLI ATVT GFLPAMIMTLLHPKKHLLPYVLI ACSMSFFLF 373 

Query: 377 SFQVHEKSXXXXXXXXXXXXXEKDI^ 436 

SFQVHEK-f D +++S+V WI+N-fA F++WPLLK+DGL L+Y V + 

Sbjct: 374 SFQVHEKTILIPLLPITl^YSSTDWNVLSLVSWINNVALFTLWPLLKKTC 433 

Query: 437 LSNWLI GNLNWI S KWLVP S FL I PGPTLS KKVTKRDTKTVvTnilWFWGSvTFVS YLGATVI 496 

LSNWLIGN +P L PGP++S ++++ + W S +Y+ 

Sbjct: 434 LSNWLI GNFSF I TPRPLPKSLTPGPSISSINSDYRITCSLLPYl^^ 493 
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Query: 497 QFVDWLYLPPAKYPDLWVTLNTTLS 546 

p+D PP+KYPDLWV+LN + F CF +FWLW Y .4-+ +KD 
Sbjct: 494 HFIiDQFVAPPSKyPDIiWVLLNCAVGFI CFS I FWLWS YYKI FTSGSKSMKD 543 

S. pombe 

Score = 369 bits (946), Expect = e-101 

Identities = 228/513 (44%) , Positives = 315/513 (61%) , Gaps = 35/513 (6%) 

Query: 21 FEN- SPVFDFLSPFGPAPNQWVXXXXXXXXX^ 79 

FEN +PV F+S F ++++ + +G YSG+NTPPMYGDFEAQRH 

Sbjct: 5 FENGAPVQQFVSRFRS YSS KFLFFPCLIMSLVFMQWLI S I GP YSGYNTPPMYGDFEAQRH 64 

Query: 80 WME I TQHLS I EKWYFYDLQ YWGLD YPPLTAFHS YFFGKLGS - FINPAWFAIiDVS RGFESV 138 

WME+T H + +WYF DLQ+WGLDYPPLTA+ S+FFG +G F NP WFA SRGFES+ 
Sbjct: 65 WMELTLHTPVSQWYFRDLQWWGLD YPPIjTAYVS WFFGI I GHYFFNPEWFADVTSRGFES L 124 

Query: 139 DLKSYMRATAII^ELLCFIPAVIWYCrom^ 198 

+LK +MR+T I S LL +P +++Y +W N +++ +LF P+L++IDHG 

Sbjct: 125 ELKLFMRSTVIASHLLILVPPLMFYSKWWSRRI - - PNFVDRNASLIMVLFQPALLLIDHG 182 

Query: 199 HFQYNSVMLG FALLS ILNLLYDNFALAAIFFVLS I SFKQMALYYS PIMFFYMLSVSCWPL 258 

HFQYN VMLG + +1 NLL + '+ A FF L+++FKQMALY++P +FFY+L P 
Sbjct: 183 KFQYNCVMLGLVtfTAIANLLK^ 242 

Query: 259 OTFNLLRLAT I S IAVL L T FATLLL P FVL VDGMS Q I GQ I LFRVFPFSRGLFEDKV 318 

F+ R +S+ V+ TF+ +L P++ +D + + QIL RVTPF+RGL+EDKVANFWCT 
Sbjct: 243 IRFS - -RFIIjLSVTVVFTFSLILFPWTYI^YKTLLPQILHRVF^ 300 

Query: 319 TNT LVKYKQLFTDKTLTRI S LVATL I AI S PS CF 1 1 FTHPKKVLLPWAFAACSWAF YLFS F 378 

N + K +++FT L ISL+ TLI+I PSC I+F +P+K LL FA+ SW F+LFSF 
Sbjct: 301 LNTVFKI REVFTLHQLQVI S LI FTL I S I LPS CVI LFLYPRKRLLALGFAS ASW GFFLFS F 360 

Query: 379 QVHEKSXXXXXXXXXXXXXEKD^ 438 

QVHEKS ++ + +N+A FS+WPLLK+DGL L+YF L ++ 

Sbjct: 361 QVHEKSVLLPLLPTSILLCHGNITTKPWIAI^^ 420 

Query: 439 NWLIGNLNWISK^VTSFLIPGPTLSKKVPKRDTKTVVHTHWFW^ 498 

NW IG++ SK ++ F + Y-M3 VI 

Sbjct: 421 NW - 1 GDMWFS KNVLFRF 1 QLS FYVGMI VI LG 451 

Query: 499 VDWLYLPPAKYPDLWVILNTTLSFACFGLFWLW 531 

+D PP++YPDLWVTLN TLSFA F +LW 
Sbjct: 452 IDLFIPPPSRYPDLWVTLNVTLSFAGFFTIYLW 484 



D. melanog-aster 

Score = 247 bits (630), Expect = 4e-64 

Identities = 175/490 (35%), Positives = 267/490 (54%), Gaps a 55/490 (11%) 

Query: 57 VGLGSYSGFNTPPMYGDFEAQRHWMEITQHLSIEKWYF YDLQYWGLDYPPLTAFHS 112 

+ L SYSGF++PPM+GD+EAQRHW EIT +L++ +WY DLQYWGLDYPPLTA+HS 
Sbjct: 19 ISLYSYSGFDSPPMHGDYEAQRHWQEITVNLAVGEWYTNSSNNDLQYWGLDYPPLTAYHS 78 

Query: 113 YFFGKIX3SFINPAWFALDVSRGFESVDLKSYMRATAILSEL 172 

Y G++G+ I+P + L SRGFES + K +MRAT + +++L ++PA++ + + 

Sbjct: 79 YLVGRI GAS IDPRFVELHKSRGFESKEHKRFMRATWSADVLI YLPAMLLLAYSLDKAFR 138 
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Ouerv- 173 NQNAIEQTIIASAILFNPSLI I IDHGHFQYNSVMLGFAUjS ILNLLYDNFALAAIFFVIjS 232 

^* + + + ++A P +ID+GHFQYN++ LGFA ++I +L F AA FF L+ 

Sbjct: 139 SDDKLFLFTLVAAY- - - PGQTLIDNGHFQYKNISI^FAAVAIAAILRRRFYAAAFFFTLA 195 

Ouerv ■ 233 I S FKQNIAL YYS P I MFFYMLS VS CWPLKNFN - - LLRIAT I S IAVLJjTFATLLLP FVLVDGM 290 

+++KQM LY+S + FF L C K+F + ++ 1+ VL TFA L +P+ + + 
Sbjct: 196 LNYKQMELYHS - LPFFAFLLGECVSQKS FAS FIAE I SRIAAWLiGTFAILWVPW- - LGSL 252 

Query: 291 SQ I GQ I LFRVFP FS RGLFEDKVA2JFWCTTNILVKYKQLFTDKTLTRI S LVATL I AI S PS C 350 

+ Q+Ii R+FP + RG+ FEDKVAN WC N++ K K+ ++ + + + TLIA P+ 
Sbjct: 253 QAVLQVLHRLFPVARGVFEDKVAIAWC^ SNDQMALVCI ACTLIASLPTN 312 

Query- 351 FIIFTHPKKtTCaljPWAFAAC^AFYLFSTO 409 

++F V A S AF+LFSFQVHEK+ + + CW 

Sbjct: 313 VLLFRRRTNVGFLIiALFNTS IAFFLFS FQVHE KTI LLTALPA- LFLLKCWP 362 

Query: 410 iSNIAFFSMWPLLKTO^^ 464 

+ FSM PLL RDL+V + ++ +SK 
Sbjct: 363 DEM I L FIjEVTVF S^P IiLARDELLVP AWATVAFHL I FKC FD S KS K - - - : LS 410 

Query 465 KKVPKMTKTVVOTHWFWGSOTFVS YLGATVI QFVDWLYLP - PAKYPDIiWVT LNTTLS FA 523 

+ P+ + ++S + A+ L+PP KYPDLW ++ + S 

Sbjct: 411 NEYPLKTIANI SQILMISWVAS- LTVPAPTKYPDLWPLIISVTSCX5 456 

Query: 524 CFGLFWLWIN 533 

F LF+LW N 
Sbjct: 457 HFFLFFLWGN 466 



A. thaliana 

Score = 244 bits (622), Expect = 3e-63 . 
Identities = 187/48B (38%) , Positives = 248/488 (50%), Gaps = 39/488 (7%) 

YSGFNTPPMYGDFEAQRHWMEITQHLSIEKWY FYDLQYWGLDYPPI/TAFHSYFFGK 1! 

YSG PP +GDFEAQRHWMEIT +L + WY + DL YWGLDYPPLTA+ SY G 
YS GAG I P P KFGDFEAQRHWMEI TTNLP VI DWYRNGT YNDLTYWGLDYP PLTAYQS YI HG I i: 

LGS FINPAWFAIjDVSRGFESVDIjKS YMRATAI LiSELLCFI PAVIWYCRWMGLNYFNQNAI 1 

F NP AL SRG ES K MR T + S+ F PA +++ N 
FLRFFOTESVAIjLSSRGHESYIjGKIjLM^ 1 

EQTIIASAIIiFNPSLI I IDHGHFQYNSTVMIiGFAIjI^IIiNIjLYDNFAIiAAIFFVlAS I 2! 
E + IL NP 1*1 + IDHGHFQYN + LG + +1 +L ++ h + F L++S KQ 

EVAVmiAMILLNPCMLIDHGHFQYNCISI^ 2 

MALYYSPIMFFYMLSVSCOTLKNFNLLRIiATISIAVIi 2 
M+ Y++P F ++L C K+ +L + + IAV++TF P+ V + +L 

MSAYFAPAFFSHLLG-KOuRRKS-PILSVIia^IAVIVTFVTFWPY- -VHSLDDFLMVL 2 

FRVFPFSRGLFEDKVANFWCTTNILVTCYKQIiFTDKTLTRI SLVATLI AI S PS CFI I FTHP 3 
R+ PF RG++ED VANFWCTT+IIi+K+K LFT ++L ISIj AT++A PS P 



Query: 


62 


Sbjct: 


61 


Query: 


118 


Sbjct: 


121 


Query: 


178 


Sbjct: 


181 


Query: 


238 


Sbjct: 


241 


Query: 


298 


Sbjct: 


297 


Query: 


358 


Sbjct: 


357 



S AFYLFSFQVHEKS L + ++ A FS 
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Query: 41B MWPLLKRDGLALEOTVLGILSNWLI GNIiNWISKWLVPSFL IPGPTLSKKVPKRD 471 

M+PLL RDL + YLL + GN + IKVF PG 
Sbjct: 413 h^PLLC^KLLIPYLTLSFLFWIYHSPG 464 

Query: 472 TKTWHTHWFW3SVTFVSYIX3AOT 531 

++ TH+F V V Yh PP KYP L+ h L F+ F +F + 

Sbjct: 455 - - - LLRTHFFISWIiHVLYLTI K PPQKYPFLFEALIMILCFSYFIMFAFY 511 

Query: 532 tNYNLYIL 539 

NY + L 
Sbjct: 512 TNYTQWTL 519 
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K. lactisALG6 

ATCTCTGTTTCAACAGCTCTTGCATTCATTGGTTCTTTCGGTCCAATCTATA 

TCTTTGGAGGATACAAGAACTTAGTGCAATCAATGCACAGGATTTTTCCAT 

TTGCCAGGGGTATCTTTGAAGATAAAGTTGCGAATTTTTGGTGCGTTTCTA 

ATATTTTCATCAAATATAGAAATCTATTCACTCAGAAGGATCTTCAATTAT 

ACTCATTACTCGCAACAGTTATTGGGCTTTTACCATCATTCATTATAACAT 

TTTTATACCCGAAGAGACATTTACTACCATATGCTTTGGCCGCATGTTCGA 

TGTCATTCTTCTTATTCAGCTTCCAGGTTCATGAAAAGACAATCTTATTAC 

CTTTACTTCCTATTACACTCTTGTACACGTCAAGAGATTGGAATGTTCTAT 

CATTGGTTTGTTGGATTAACAACGTGGCATTGTTTACACTCTGGCCATTAC 

TGAAAAAGGACAATCTAGTATTGCAATATGGAGTCATGTTCATGTTTAGC 

AATTGGTTGATCGGTAACTTCAGTTTCGTCACACCACGCTTCCTCCCAAAA 

TTTTTGACACCAGGGCCATCCATCAGTGATATAGATGTTGATTATAGACGG 

GCAAGTTTACTACCCAAGAGCCTAATATGGAGATTAATCATTGTTGGCTCA 

TATATTGCAATGGGGATTATTCATTTTCTAGACTATTACGTCTCCCCGCCA 

TCAAAATACCCTGATTTATGGGTGCTTGCCAATTGTTCCTTGGGCTTCTCA 

TGTTTTGTGACATTTTGGATATGGAACAATTATAATTATTCGAAATGAGAA 

ACAGCACTTTGCAAGATTTA 



K lactis Alg6p 

ISVSTAIJ^GSFGPIYIFGGYKNLVQSMHRIFPFARGIFEDKVANFWCVSNIFIK 

YIUSfLFTQKDLQLYSLIATVIGL^ 

VHEKTTLIJ'LLPriXLYTSRDWNVIJSLV^ 

VMFMFSNWLIGNFSFVTPRFIPKFLTPGPSISDIDVDYRRA5LLPKSL 

GSYIAMGEHFLDYYVSPPSKYPDLWVLANCSLGFSCFVTTWTWNNYNYSKZE 

TALCKI 



38/46 



WO 03/056914 



PC17US02/41510 
FIGURE 29 (sheet 1) 



KLlactisALG6 BLAST 



Score E 

Sequences producing significant alignments: (bits) Value 





1420090 |emb| CAA9 9190.1 | ORF YOR002w [Saccharorayces cerev. . . 392 e-108 


gi 
gi 


7490584 |pir| |T40396 glucosyl transferase - fission yeast . . . 187 2e-46 
15240920 |ref |NP 198662. l| glucosyl transferase- like prote ... 117 2e-25 


gi 


7019325 |ref 


NP 037471. 1| dolichyl-P-Glc:Man9GlcNAc2-PP-d. . .103 2e-21 


gi 


12002040 




AAG43163.l|AF063604 1 brain my046 protein [H...102 Be-21 


a 1 


19921070 


reflNP 609393. l| CG5091-PA [Drosophila melanoga . . . 101 le-20 



Alignments 



3. cerevislae 
Score » 392 bits (1006) , Expect = e-108 

Identities = 1B2/280 (65%), Positives = 218/280 (77%), Gaps = 1/280 (0%) 
Frame = +1. 

Query: 1 I SVSTAIiAFIGS FGP I YI FGG- YKNLVQSMHRI FPFARGI FEDKVANFWCVSNT FI KYRN 177 

1+ +T F F P+Y GG KN+'Q +HRI FPFARGI FEDKVANFWCV+N+ F+ KY+ 
Sbjct: 265 I AFATLATFAI I FAPLYFIiGGGLKSIIHQCIHRI FPFARGI FEDKVAWFWCVTNVFVKYKE 324 

Query:" 178 LFTQKDI^LYSLLATVIGLLPSFIITFLYPK^ 357 

FT + LQLYSlH-ATVTG LP+ I+T L+PK+HLLPY L ACSMSFFLFSFQVHEK 
Sbjct: 325 RFTIQQLQLYSLIATVIGFTiPAMIMTLLHPKX^ 3 84 

Query: 358 537 

, Y+S DWNVLSLV WINNVALFTLWPLLKKD L LQY V F+ SNWLIGKTFSF 

Sbjct: 385 PLLPITLLYSSTDWNVLSLVSWINOTALFTLWPLLKKDGLHLQYA 444 

Query: 538 VTPRFLPKFLTPGPSISDIDVDYRRASLIiPKSLIWRLIIVGSYIAMGIIHFLDYYVSPPS 717 

+TPRFLPK LTPGPSIS 1+ DYRR SLLP +++W+ I+G+YIAMG HFLD +V+PPS 
Sbjct: 445 ITPRFLPKSLTPGPSISSINSDYRRRSLLPYNWWKSFIIGTYIAMGFYHFLDQFVAPPS 504 

Query: 718 KYPDLWVLANCSLGFSCFVTFWIWNNYXIjFEMRNSTLQDL 837 

KYPDLWVL NC++GF CF FW+W+ Y +F. + +++DL 
Sbjct: 505 KYPDIiWVTiLNCAVGF I CFS I FWLWS YYKI FTS GS KSMKD L 544 



S. pombe 

_Score = 187 bits (475), Expect = 2e-46 
Identities = 106/280 (37%) , Positives = 150/280 (53%) , Gaps = 1/280 (0%) 
Frame = +1 

Query: 1 I SVSTAIiAFIGSFGPIYIFGGYKNLV-QSMHRI FPFARGI FEDKVANFWCVSNIFIKYRN 177 

+SV+ F P +1+ YK L+ Q +HR+FPFARG++EDKVANFWC N K R 

Sbjct: 251 LSVTVVFTFSLILFP-WIYMDYKTLLPQII^ 309 

Query: 178 LFTQKDLQLYSLLATVI GLLPSFI I TFLYPKRHIiLPYALAACSMSFFLFSFQVHEKXXXX 357 

+FT LQ+ SL+ T+I +LPS +1 FLYP++ LL A+ S FFLFS FQVHEK 
Sbjct: 310 VFTLHQLQVT SLI FTLI S I LPS CVILFLYPRKRIiLALGFAS ASWGFFLFSFQVHEKKVLL 369 
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XXXXXXXXYTSRDWNVLSLVCW^ 537 

+ + NN+A+F+LWPLLKKD L LQY + + NW 
PLLPTSILUHGNITTKPWIALANNIA^ 422 

WPRFIiPKFLTPGPSISDIDVDYRR^LLPKSLIWRLI IVGSYIAMGI IHFLDYYVS PPS 717 

I D+ V K++++R I + Y+ M +1 +D ++ PPS 
IGDMW FSKNVLFRFIQIiSFYVGMrVTLGIDLFIPPPS 460 

KYPDLWVLANCS LGFS CFVTFWI WNNYXIiFEMRNSTLQDL 837 
+YPDLWV+ N +L F+ F T ++W h + + DL 
RYPDLWVT LNVTLS FAGFFT I YLWTLGRLLHI S S KXiSTDL 500 



A. thaliana 
Score - 117 bits (292) , Expect » 2e-25 

Identities = 81/240 (33%), Positives = 120/240 (50%), Gaps = 2/240 (0%) 
Frame = +1 

Query: 85 MHRI FPFARGI FEDKVANFWCVSNTFI KYRNliFTQKDLQLYS LLATVI GIjIiPSFI ITFLY 264 

+ R+ PF RGI+ED VANFWC ++I IK++NLFT + L+ SL AT++ LPS + L 
Sbjct: 296 LSRIJ^FERGIYEDYVANFWCTTSILIKWKNLFTTO 355 

Query: 265 PKRHLLPYALAACSMSFFLFSFQVHEKX^^ 444 

p Y L SM+F+LFSFQVHEK + h + ALF 
Sbjct: 356 PSNEGFLYGLIjNSSMAFYLFSFQVHEKSILMPFLSATLLALK^ ALF 411 

Query: 445 TLWPLLKKDNLV^YGVMF^ 618 

+4-+PLL +D IH-+ Y + SF+ F + +PG +1 DV + 
Sbjct: 412 SMFPUjCRDKLLI PYLTL - - SFL FTVIYHSPGNHHAJQKTDVSFFSFK 457 

Query: 619 LLPKSLIWRLI IVGSYIAMGI IHFLDYYVSPPSKYPDLWVIANCSIiGFSCFVTFWIVmNY 798 

p + L+ +I++ ++H L + PP KYP L+ L FS F+ F + NY 

Sbjct: 458 NFPGYVF - - IiLRTHFF I SV- VIiHVLYLTI KPPQKOTFLFEALIMILCFSYFIMFAFYTNY 514 

H . sapiens 



Score m 103 bits (258) , Expect = 2e-21 

Identities = 78/266 (29%), Positives = 123/266 (46%), Gaps = 3/266 (1%) 



Frame 


B +1 






Query: 


7 


VSTAIiAFIGSFGPIYI - - FGGYKNLVQSMHRI FPFARGI FEDKVANFWCVSNI FIKYRND 


180 




V A + SF ++ F + +Q + R+FP RG+FEDKVAN WC N+F+K +++ 




Sbjct: 


232 


VTO^CIWASFVLCWLPFFTEREQTIjQVLRRL^ 


291 


Query: 


181 


FTQKDLQL YSLLATVI GLLPS FI I TFLYPKRHUjP YALAACSMSFFIiFS FQVHEKXXXXX 


360 




+ + S T + LLP+ 1 LP + L +C++SFFLFSFQVHEK 




Sbjct: 


292 


LPRHIQLIMSFCFTFLSliLPACIKLILQPSSKGFKFTLVSCALSFFLFSFQVHEKSILLV 


351 


Query: 


361 


XXXXXXX YTSRDWNVLSIiVCW I NNVAL FTLWPLIiKKDNLVLQ YGVMFM- FSNWLIGNFSF 


537 




+ + + w V+ F++ PLL KD L++ V M F + +FS 




Sbjct: 


352 


SLPVCLVLS EIPFMSTWFIiLVSTFSI^PLLLKDELIWSVVTTMAFFIACVTSFSI 


407 


Query: 


538 


VTPRFLPKFLTPGPS I SD IDVDYRRAS LLPKS L IWRLI I VGS YI AMG 1 I HFLD YYVS PPS 


717 




+ SIS V SI+++SIM+++ +PP 




Sbjct: 


408 


FEKTSEEELQLKSFSIS VRKYIiPCFTFLS RI I Q YLFLI SVI TMVLLTLMTVTLDP PQ 


464 


Query: 


718 


KYPDLWVXANCSIiGFSCFVTFWIWNN 795 






K PDL+ + C + F+ F ++ N 




Sbjct: 


465 


KLPDLFSVLVCFVSCLNFLFFLVYFN 490 
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a-factor 
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TCATTCAAACTGAAAACAAAA.CAGGAAGAGGG 

GGGGCCGATCCTAAACCAA.TTAATTTATTTATTTGGGAGGATGGGGGCGGGCTCGGG 
AGGGAGGAGAGGGGTTGAACAGTTTCCTTTTGTTCCTCACTGTTAATTCGCCCACCT 
TCGGGCCCTTCTTGTTCTGCAGCGCCAAGCAGGGTGCAGAGGGGCTGTGGCTTGC^ 
GAGGGGCC^CTGTGGGGCTTCACTCCTGGTCAC^GGTGGCAGCAGAGAA^GAGATG 
TCTATAAGCAGGGGGATGTAGCTC^GTTTGTAGAATGCTTGCATAGCATAAATGAAG 
TCCTGGGTTCCATCCCCAGCACC^ 

CCAAGCATTCTCCTTGGCTAC^TAACAAAAGCAAGGCCTrTGTCCCC^ 

TACAAGAGACCCTATCTGA.GAAAATTGTGGGGGGGAGGGGGGGGGAAAT 

AAACACAGCCAGTCACTGTCACTGCATTC 

GGCAGATAACAGCTAAAAGGCACATAACCTTGGTGGGGAAATAAATGCCTGTGGTGT 
CCTGAGGGCCCCACCAAGTTCCAAAAAAAAAAAA 



>gi | 18997007 | gb [ AAL83249 . 1 |AF474154_1 N- 
acetylglucosaminyltransf erase V [Mus musculus] 

MAFFSPWKLSSQKLGFFLVTFGFIWGMMLLHFTIQQRTQPESSSMLR^ 

IKALAEENRDVVDGPYAGVM^ 

STNSTTAVPSLVSLEONVADIINGVQEKC^ 

YADYGVDGTS CS FFI YLS EVENWCPRLPWRAKNP YEEADHNS LAE I RTDFNT LYGMM 
KiOKEEFRWMRLRIRRMADAWIQAIKSLA^ 

ETAFSGGPLGELVQWSDLITSLYLLGHDIRISASLAELKEIMKKWGNRSGCPTVGD 
RIVELIYIDIVGLAQFKKTLGPSWVHYQCMLRVLDSFGTEPEFNHASYAQ 
WGKWNLNPQQF YTMFPHTPDNS FLGFWEQHLNS SD IHHINE I KRQNQSLVYGKVDS 
FWKNKKIYLDIIHTYMEVHA 

LGFPYEGPAPLEAIANGCAFIJ^PKFNPPKSSKISrTDFFIGKPTLRELTSQHPYA 
GRPHVWTVDLNNREEVEDAVKAim 

QVIWPPLSALQVKLAEPGQSCKQVCQESQLICEPSFFQHLNKEro 

LYKD I LVP S F YPKS KHCVFQGDLLLF S CAGAHPTHQRI CPCRDF I KGQVALCKDCL 
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